Teams often treat retrieved documents as safe because they are business content, not code. That assumption fails when a document carries hidden instructions that the model later processes as context. The right mental model is to treat any external text source as potentially adversarial until it has been inspected and constrained.
Why Teams Misread Indirect Prompt Injection
indirect prompt injection is often missed because teams assume the risk sits only in user prompts, not in retrieved text, documents, tickets, emails, or web pages that the model later treats as context. That blind spot matters most in agentic systems, where the model can act on those instructions, call tools, and carry state across steps. Current guidance from OWASP Agentic AI Top 10 and OWASP Agentic Applications Top 10 treats this as a control problem, not a content moderation problem.
Teams also underestimate how often the attack path is boring: a poisoned knowledge base entry, a malicious vendor note, or a compromised support article that looks legitimate in retrieval. Once that text is embedded into the model’s working context, the model cannot reliably distinguish business content from hostile instruction unless the system deliberately constrains what retrieved text can do. In practice, many security teams encounter the impact only after an agent has already leaked data, executed an unsafe tool action, or chained instructions across multiple steps, rather than through intentional testing.
How It Works in Practice
The practical mistake is treating retrieved content as passive reference material. In an indirect injection scenario, the model may read a document that says, in effect, “ignore previous instructions” or “send the latest secret to this endpoint,” and then continue processing as though that text were legitimate context. The model does not need to be tricked into “believing” the statement in a human sense. It only needs enough authority to follow it.
That is why retrieval layers, tool permissions, and output handling need separate controls. Security teams should constrain what external text can influence, filter or label untrusted sources, and keep tool execution behind explicit policy checks. For agentic systems, this usually means combining intent-based authorization, just-in-time credentials, and workload identity so the agent can only do what the current task requires. The OWASP Agentic AI Top 10 and OWASP Agentic Applications Top 10 both point practitioners toward the same operational lesson: the model’s context boundary is not a trust boundary.
- Classify retrieved sources as trusted, semi-trusted, or untrusted before they reach the prompt.
- Separate “read” access to content from “act” access to tools and data stores.
- Use ephemeral secrets and short-lived credentials so a compromised step has limited blast radius.
- Log retrieval, prompt assembly, and tool calls together so investigators can reconstruct the chain.
NHI governance adds an important datapoint here: NHI Mgmt Group research reports that only 5.7% of organisations have full visibility into their service accounts, which shows how often machine identities are already poorly understood before agentic workflows enter the picture. That is one reason indirect prompt injection should be treated as an identity and authorization problem as much as a content problem. These controls tend to break down when agents have broad tool access across many systems because a single poisoned context can trigger actions faster than a human reviewer can intervene.
Common Variations and Edge Cases
Tighter retrieval filtering often increases operational overhead, so organisations must balance safety against recall, latency, and user experience. That tradeoff becomes sharper when the system must search large knowledge bases, parse long documents, or process third-party content at scale.
There is no universal standard for this yet, but current guidance suggests several recurring edge cases. First, a document may contain both legitimate business content and hidden instructions, so simple allowlists are not enough. Second, an agent that can browse, summarize, and execute actions can be influenced across multiple turns, making single-prompt defenses too narrow. Third, some environments rely on long-lived secrets or shared service accounts, which turns a prompt injection into a broader identity compromise. NHI Mgmt Group research shows that 96% of organisations store secrets outside of secrets managers in vulnerable locations including code, config files, and CI/CD tools, which widens the impact if an agent is pushed into misuse.
For teams building controls, the safer pattern is to pair content inspection with runtime policy enforcement, short-lived credentials, and scoped workload identity. Where agents must act on external text, a human approval step or a hard policy gate may still be necessary for high-impact actions. Where those safeguards are absent, indirect prompt injection stops being an edge-case prompt issue and becomes a path to unauthorized tool use, data exposure, and privilege expansion.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A01 | Indirect prompt injection is a core agentic app input-influence risk. |
| CSA MAESTRO | MAESTRO addresses agent workflow trust boundaries and tool misuse. | |
| NIST AI RMF | AI RMF supports governing model behavior and downstream harms from unsafe context. |
Define accountability, test for prompt injection, and monitor agent outputs for unsafe actions.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 7, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org