By NHI Mgmt Group Editorial TeamPublished 2025-06-11Domain: Agentic AI & NHIsSource: Cerbos

TL;DR: RAG-powered AI agents can surface sensitive internal data, leak confidential material, or be manipulated through prompt and context injection when permission checks are missing, according to Cerbos. The security problem is not the model alone but the trust boundary around retrieval, authorization, and downstream response generation.


At a glance

What this is: This article argues that RAG-based AI agents need authorization-aware filtering because unrestricted retrieval turns useful assistants into data exposure risks.

Why it matters: It matters because IAM, PAM, and lifecycle controls must now govern which users and agents can retrieve which records, not just who can log in.

By the numbers:

  • 92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so.
  • 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.

👉 Read Cerbos' analysis of authorization-aware access control for RAG AI agents


Context

RAG-based AI agents change the identity problem because they do not just answer questions, they retrieve data on behalf of a user and then shape the response from that retrieval path. That makes permission enforcement part of the agent workflow, not a separate control layered on after the fact.

The failure mode is easy to define in plain terms: if an assistant can query more data than the requesting user is allowed to see, the model can expose information that IAM policy would otherwise block. In product environments, that risk shows up in employee assistants, customer-facing copilots, and internal knowledge search alike.

This is an NHI and application authorization problem at the same time. The relevant question is no longer whether the model is accurate, but whether the retrieval path, the prompt assembly, and the response output all respect the same access boundary.


Key questions

Q: How should security teams implement access control for RAG-based AI agents?

A: They should enforce authorization before retrieval, not after generation. The agent should only query records that match the requesting user's role, department, region, or other policy attributes. That keeps the model from seeing protected content in the first place and prevents downstream leakage through the response layer.

Q: Why do RAG-based assistants create more risk than a normal search tool?

A: Because they do not just return matching records, they assemble those records into a generated answer that can expose sensitive context, merge fragments, or amplify poisoned data. The risk increases when the retrieval path is broader than the user's entitlement, because the model can convert excess access into a polished disclosure.

Q: What do security teams get wrong about prompt injection in AI assistants?

A: They often treat prompt injection as a model safety issue alone, when it is also a trust issue in the content pipeline. If external documents, tickets, or wiki pages are not filtered and validated before retrieval, malicious instructions can enter the answer path without any code compromise.

Q: How can organisations tell whether AI agent permissions are actually working?

A: They should test whether the assistant can retrieve restricted records, whether policy filters are applied before prompt construction, and whether response logs prove the final answer stayed within the user's entitlement. If any of those checks fail, the control is cosmetic rather than effective.


Technical breakdown

How RAG changes the authorization boundary

Retrieval-Augmented Generation connects a language model to external data sources such as document stores, databases, or APIs so the model can answer with current context. The security issue is that retrieval becomes an active access event, not a passive read. If authorization is checked only at login, the system can still assemble prompts from records the requester should not see. The model then turns that over-broad retrieval into a human-readable answer. In governance terms, the access decision must happen before data enters the prompt, and it must be tied to the requesting identity, not just the service identity running the agent.

Practical implication: enforce authorization at retrieval time, not only at session start.

Prompt injection and context poisoning in connected data sources

RAG systems are exposed to prompt injection when malicious instructions are embedded in content that the model later retrieves. Context poisoning is related but broader: the attacker contaminates the knowledge source so the model ingests misleading or hostile material during answer generation. Because the model trusts retrieved context by design, poisoned records can influence policy-breaking outputs, data exfiltration, or unsafe instructions. This is not a classic code exploit. It is an input-trust failure in the data pipeline that feeds the model, which is why source integrity and content filtering matter as much as model tuning.

Practical implication: treat retrieved content as untrusted input and validate it before prompt construction.

Centralized policy enforcement for AI agents and applications

The article's core architecture is a centralized authorization layer that decides what the agent may retrieve before the LLM generates a response. That pattern is useful because it lets the same policy logic govern APIs, applications, and AI assistants through role, department, region, or other attributes. The important point is not the tool name but the control model: a single policy engine can prevent the assistant from becoming a privileged bypass around existing application rules. Without that shared enforcement point, each new assistant becomes a separate authorization silo.

Practical implication: align agent access controls with existing policy engines so AI does not bypass application governance.


NHI Mgmt Group analysis

Authorization-aware retrieval is the new control plane for AI assistants. RAG systems collapse the old separation between application access and answer generation because the model can only be as safe as the data it is allowed to retrieve. That makes policy enforcement before retrieval the decisive control, not a nice-to-have filter. Practitioners should treat retrieval as an identity decision, not a search function.

Access control without context filtering fails inside RAG workflows. Traditional IAM can authenticate the user and still lose control once the agent begins assembling context from documents, databases, or APIs. That is why this pattern creates a permission boundary inside the product itself, where role, department, and attribute rules must be applied before data is exposed. Teams that do not govern that boundary are not governing the agent, they are trusting it.

Prompt injection is an authorization problem disguised as an LLM problem. The system does not need to be fully compromised for a poisoned document to change the answer path, because the model treats retrieved context as authoritative input. The named concept here is retrieval trust boundary: the point where unvalidated external content becomes part of the model's decision context. Once that boundary is loose, the assistant can be made to speak outside policy.

Centralized policy management matters more than isolated AI controls. The strongest insight in this topic is not about one chatbot, but about whether authorization logic is consistent across APIs, apps, and agents. If the AI layer uses a different policy model from the rest of the stack, enterprises create a privileged side channel. Practitioners should align AI governance with the same identity policy source of truth used elsewhere.

RAG makes user identity and machine identity inseparable in practice. The agent acts with system credentials, but the safety outcome depends on the user's entitlements. That coupling means governance must track both the human requester and the non-human retrieval path together. Security teams should treat this as an authorization chain, not a model feature.

From our research:

  • 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials, according to AI Agents: The New Attack Surface report.
  • Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
  • For a broader control model, review OWASP Agentic AI Top 10 alongside the agent governance guidance in our research on AI agent risk.

What this signals

Retrieval trust boundary: once an assistant can pull from multiple repositories, the governance question shifts from model accuracy to policy enforcement before retrieval. Teams should expect more requests to centralize authorization logic so AI products do not become separate exceptions to identity policy.

The next maturity step is to align human entitlements, application permissions, and agent retrieval scopes in a single control plane. That matters because the same user can be safe in the app and unsafe in the assistant if the retrieval path is broader than the business rule.

With 52% of companies able to track and audit the data their AI agents access, per AI Agents: The New Attack Surface report, auditability is becoming a gating requirement for deployment, not a post-incident diagnostic.


For practitioners

  • Enforce retrieval-time authorization checks Apply policy before documents, rows, or API responses are injected into the prompt so the model never sees data the user cannot access.
  • Classify the data sources behind every agent Inventory which repositories, APIs, and knowledge bases feed each AI assistant, then map those sources to the same access rules used elsewhere in the product.
  • Filter retrieved content before prompt assembly Treat document fragments, search snippets, and vector results as untrusted input and remove content that could influence unsafe or out-of-scope responses.
  • Audit agent responses against policy decisions Log which records were retrieved, which attributes were used to permit access, and whether the final response stayed within the requester’s entitlement boundary.

Key takeaways

  • RAG-based AI assistants turn data retrieval into an authorization problem, because the model can only be safe if the retrieval path respects the user's entitlement.
  • Prompt injection and context poisoning are pipeline trust failures, not just model weaknesses, which means filtering and validation must happen before prompt assembly.
  • Centralized policy enforcement is the practical control pattern, because it keeps AI assistants aligned with the same identity rules that govern applications and APIs.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A2Covers prompt injection and tool/data access abuse in agentic workflows.
NIST AI RMFAI governance and accountability apply to assistants that access enterprise data.
NIST CSF 2.0PR.AC-4Access permissions must be enforced consistently across apps, APIs, and agents.

Define governance, measurement, and oversight for every AI assistant that can retrieve or expose sensitive data.


Key terms

  • Retrieval-Augmented Generation: A pattern where a language model pulls external documents or records before generating an answer. The model does not rely only on training data, which improves freshness and specificity, but it also expands the attack surface because the retrieval layer can expose or poison the context the model uses.
  • Prompt Injection: A technique that places malicious instructions into content a model will later read and follow. In RAG systems, the injection may live in a document, ticket, or knowledge base entry, then influence the model once the content is retrieved and treated as trustworthy context.
  • Authorization-Aware Filtering: A control pattern that limits which data records can be retrieved based on the user's identity and policy attributes before anything reaches the model prompt. It reduces overexposure by ensuring the assistant only sees content the requester is entitled to access.
  • Retrieval Trust Boundary: The point in an AI workflow where externally sourced content becomes part of the model's decision context. When that boundary is weak, unvalidated data can shape outputs, leak sensitive information, or override intended guardrails, making governance fail at the input stage.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building or maturing an identity security programme, it is worth exploring.

This post draws on content published by Cerbos: authorization-aware data filtering for RAG-based AI agents. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-06-11.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org