TL;DR: Internal RAG agents combine retrieval, prompt injection, and prompt compression in ways that improve usefulness but also enlarge the attack surface if retrieved content is not validated, sanitised, and constrained before it reaches the model, according to Kong. The real issue is that governance now extends into the retrieval path, where trust assumptions can be broken before the LLM ever responds.
At a glance
What this is: This is a Kong engineering guide to building an internal RAG agent with Kong AI Gateway, highlighting retrieval, chunk injection, and prompt compression as the core design pattern.
Why it matters: It matters because IAM, NHI, and AI governance teams now have to think about who or what can inject context into model sessions, how that context is trusted, and where privilege, data exposure, and prompt injection controls sit in the request path.
By the numbers:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.
- 96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.
👉 Read Kong's guide to building an internal RAG agent with Kong AI Gateway
Context
Retrieval-Augmented Generation, or RAG, adds external context to an LLM prompt so the model can answer with fresher, domain-specific information. That design helps accuracy, but it also creates a new governance problem for AI gateway teams: retrieved content becomes part of the trust boundary, which means insecure chunks, bad metadata, or prompt injection can influence the model before any output is generated.
For IAM and NHI practitioners, the key question is not whether RAG works technically. It is who controls the retrieval pipeline, how injected content is validated, and whether the surrounding platform can prevent a model session from inheriting untrusted data as if it were authoritative. Kong’s approach is typical of enterprise AI platform building, not an edge case.
The article also touches a broader operating model issue: the same gateway that brokers API traffic is increasingly being asked to govern AI context, token usage, and prompt shaping. That convergence means identity teams can no longer treat model access, data access, and application access as separate controls.
Key questions
Q: How should security teams govern retrieval paths in RAG systems?
A: Security teams should treat retrieval as part of the trusted control plane, not as a background search function. That means source approval, chunk validation, vector-store access control, and response logging must all be governed. If the retrieval path is weak, the model can inherit untrusted context before any policy check can intervene.
Q: Why does prompt role choice matter in AI gateway design?
A: Prompt role choice matters because the role determines how much authority the model gives retrieved text. System-level injection can strengthen guidance, but it also increases the impact of malicious or malformed chunks. Teams should match role assignment to trust level and avoid giving instruction status to unverified content.
Q: What breaks when compressed prompts remove security context?
A: When compression strips out provenance markers, policy text, or instruction boundaries, the model may still answer but with weaker governance semantics. The system appears to work while losing the cues that separate evidence from instruction. Teams should test compressed prompts for control fidelity, not just output quality.
Q: How can organisations reduce prompt injection risk in RAG pipelines?
A: They should sanitize retrieved content, limit which sources can enter privileged prompt roles, and log the provenance of every chunk used in a response. Prompt injection is easiest to exploit when untrusted content is treated as authoritative context. The safest systems keep trust boundaries explicit and reviewable.
Technical breakdown
RAG ingestion and retrieval create a new trust boundary
RAG works in two stages. First, content is chunked, embedded, and stored in a vector database. Second, a user query is embedded and matched against those stored vectors so the most relevant chunks can be injected into the prompt. The security problem is that retrieval is not just a search step. It is a content-selection step that decides which external text becomes operational context for the model. If that text is stale, poisoned, or overexposed, the model inherits the failure. In practice, the gateway becomes part of the control plane for knowledge trust, not just request routing.
Practical implication: treat retrieval sources, chunking rules, and vector stores as governed assets, not implementation details.
Prompt injection risk depends on where retrieved context is inserted
The article highlights a critical design choice: retrieved chunks can be injected as system, user, or assistant content. That choice changes the attack surface. System-level injection gives retrieved content more authority, but also makes malicious or misleading chunks more dangerous if they are not sanitized. User-level injection is usually less authoritative, but it still allows hostile content to shape model behaviour. The deeper issue is that prompt role assignment is now an authorisation decision. It determines what the model is allowed to treat as instruction versus evidence.
Practical implication: constrain which roles can carry retrieved content and sanitize untrusted chunks before role assignment.
Prompt compression reduces cost but can also compress away safeguards
Kong’s prompt compressor uses LLMLingua to shorten prompts while preserving meaning, which helps control latency and token spend. That efficiency gain matters in production, but compression is not neutral. If the compressor strips context too aggressively, it can remove safety cues, policy text, or distinctions between user input and retrieved evidence. In RAG systems, compression should be understood as a transformation layer that affects both cost and control fidelity. When it sits between retrieval and generation, it can either preserve governance intent or blur it.
Practical implication: test compressed prompts for loss of security-relevant context, not just for answer quality.
Threat narrative
Attacker objective: The attacker aims to influence model output or data handling by getting untrusted content treated as trusted context.
- entry occurs when malicious or low-integrity content is introduced into the retrieval corpus or prompt path and later selected as context by the RAG pipeline.
- escalation occurs when that retrieved content is injected into a higher-trust role, such as system or privileged user context, where it can steer model behaviour.
- impact occurs when the model follows poisoned context, leaks data, or produces unsafe actions based on untrusted retrieved instructions.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
RAG context injection is becoming a governance control, not a plumbing detail. Once retrieved content is allowed to shape model behaviour, the security question changes from whether the LLM is safe to whether the retrieval path is trustworthy. That means data source provenance, chunk validation, and role assignment all sit inside the control boundary. Practitioners should stop treating retrieval as a neutral pre-processing step.
Prompt role assignment is a privilege decision. Injecting context as system content gives that context more authority than many teams realise, which is why prompt injection risk rises when untrusted chunks are elevated into privileged roles. The issue is not only malicious input. It is the decision to grant model instruction status to content that was never authenticated as trustworthy. Practitioners should recognise this as a policy boundary.
Context compression can create hidden policy loss. Compressing retrieved text to save tokens can unintentionally remove the very distinctions that keep the model aligned with intended use. If policy text, provenance cues, or delimiting tags are lost, the gateway may still function while governance meaning erodes. The practical conclusion is that cost optimisation must be evaluated against control fidelity, not just latency.
Identity governance now reaches into AI context supply chains. The same discipline used to control secrets, service accounts, and delegated access is needed for RAG sources, vector stores, and prompt transformers. That does not make every AI pipeline autonomous, but it does make them governed non-human workloads whose access paths must be explicit, reviewable, and bounded. Practitioners should align AI gateway design with NHI governance principles.
Internal RAG systems create a new form of identity blast radius. When one poisoned document or over-permissioned data source can influence many downstream prompts, the impact is no longer limited to a single query. The blast radius is measured in how widely untrusted context can propagate through the model estate. Practitioners should treat retrieval exposure as a multi-session control problem, not a one-off input filter.
From our research:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
- NHI Lifecycle Management Guide helps teams apply governance discipline to AI context sources, retrieval access, and lifecycle ownership.
What this signals
Context governance is now an identity problem, not just an AI engineering problem. As RAG becomes a standard pattern, teams need to know which sources can speak into a model session and under what authority. That is especially true when AI systems already act beyond intended scope in 80% of organisations, because retrieval paths can become a silent extension of privilege.
AI gateway programmes should assume that every retrieved chunk has operational consequences. The more you use prompt injection, compression, and role assignment to shape responses, the more you need provenance logging and content-tiering around the data feeding those controls. This is where AI Agents: The New Attack Surface report becomes useful for programme planning, because it frames the blind spot as a governance gap rather than a model problem.
For practitioners
- Classify retrieval sources by trust tier Separate internal documents, approved knowledge bases, and user-supplied content before they reach the vector store, and require different validation rules for each tier. The goal is to prevent low-integrity content from becoming model context without scrutiny.
- Restrict privileged prompt roles Allow retrieved context to enter system-level prompts only when the source is trusted, governed, and auditable. Use user-role injection for lower-trust content and block assistant-role injection for any unverified chunk.
- Validate compressed prompts for policy loss Test the output of compression against a set of security-critical prompts to confirm that provenance markers, policy constraints, and instruction boundaries survive shortening. Measure whether the compressed version still preserves the control intent.
- Log retrieval provenance end to end Record which source, chunk, embedding index, and role assignment contributed to each model response so you can investigate prompt injection, data leakage, or bad-answer incidents after the fact.
Key takeaways
- RAG improves answer quality only when retrieval sources, chunking, and role injection are governed as part of the trust boundary.
- Prompt role assignment and compression both change security posture, which means they need policy review rather than pure performance tuning.
- Identity and access teams should extend governance into AI context pipelines because untrusted retrieval can influence model behaviour before output controls are reached.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A2 | Covers prompt injection and agent context abuse in RAG-style AI pipelines. |
| NIST AI RMF | AI RMF governance applies to context handling, provenance, and model risk decisions. | |
| NIST Zero Trust (SP 800-207) | PR.AC-4 | Zero trust principles fit the retrieval trust boundary and context access control. |
Apply content validation and trust-tiering before retrieved text enters model prompts.
Key terms
- Retrieval-Augmented Generation: A pattern where external content is fetched and inserted into an LLM prompt so the model can answer with fresher or more specific context. In practice, it creates a trust boundary around the retrieval pipeline, because the model may treat inserted text as authoritative unless the system separates evidence from instruction.
- Prompt Injection: A technique where attacker-controlled text influences a model to follow unintended instructions or reveal data. In RAG systems, the risk increases when retrieved content is inserted without validation or role controls, because the model can confuse untrusted context with legitimate guidance.
- Prompt Compression: A process that shortens prompts to reduce token usage, latency, and cost while trying to preserve meaning. In governance terms, it is a transformation control that can also remove policy cues, provenance markers, or instruction boundaries if it is not tested for security impact.
- Context Trust Boundary: The point at which external text becomes operational input for a model and therefore must be governed like any other access path. For AI gateways, this boundary includes source approval, role assignment, sanitization, and auditability of the content passed to the model.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by Kong: Build Your Own Internal RAG Agent with Kong AI Gateway. Read the original.
Published by the NHIMG editorial team on 2025-07-09.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org