TL;DR: Internal RAG agents combine retrieval, prompt injection, and prompt compression in ways that improve usefulness but also enlarge the attack surface if retrieved content is not validated, sanitised, and constrained before it reaches the model, according to Kong. The real issue is that governance now extends into the retrieval path, where trust assumptions can be broken before the LLM ever responds.
NHIMG editorial — based on content published by Kong: Build Your Own Internal RAG Agent with Kong AI Gateway
By the numbers:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.
- 96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.
Questions worth separating out
Q: How should security teams govern retrieval paths in RAG systems?
A: Security teams should treat retrieval as part of the trusted control plane, not as a background search function.
Q: Why does prompt role choice matter in AI gateway design?
A: Prompt role choice matters because the role determines how much authority the model gives retrieved text.
Q: What breaks when compressed prompts remove security context?
A: When compression strips out provenance markers, policy text, or instruction boundaries, the model may still answer but with weaker governance semantics.
Practitioner guidance
- Classify retrieval sources by trust tier Separate internal documents, approved knowledge bases, and user-supplied content before they reach the vector store, and require different validation rules for each tier.
- Restrict privileged prompt roles Allow retrieved context to enter system-level prompts only when the source is trusted, governed, and auditable.
- Validate compressed prompts for policy loss Test the output of compression against a set of security-critical prompts to confirm that provenance markers, policy constraints, and instruction boundaries survive shortening.
What's in the full article
Kong's full blog post covers the operational detail this post intentionally leaves for the source:
- Step-by-step configuration for the AI RAG Injector plugin and prompt compressor settings
- Docker and Konnect setup details for the Redis vector database and compressor service
- Pre-function script instructions for ingesting chunks into the vector database
- Testing workflow that shows how compressed retrieval behaves in a live gateway setup
👉 Read Kong's guide to building an internal RAG agent with Kong AI Gateway →
Internal RAG agents and prompt injection: what IAM teams need to know?
Explore further