Prompt injection targets the model directly through the user prompt. Indirect prompt injection hides malicious instructions inside data the model later reads from a trusted source, such as a form submission or knowledge base. Indirect attacks are more dangerous in agentic systems because the malicious content can travel through normal workflows before it is executed.
Why This Matters for Security Teams
Prompt injection and indirect prompt injection are often discussed as if they were the same class of risk, but the difference matters operationally. Direct prompt injection is a user-facing attack against the model’s immediate input. Indirect prompt injection is a supply-chain style problem: malicious instructions are embedded in content the model later consumes from a trusted workflow, such as tickets, documents, knowledge bases, or tool outputs. That makes it especially relevant to autonomous systems that chain actions across tools.
For agentic systems, the risk is not only that the model reads hostile text, but that it may act on it with workload credentials, API access, and delegated authority. That is why current guidance from the OWASP Agentic AI Top 10 treats prompt injection as a core application risk, while OWASP Agentic Applications Top 10 frames it in the context of tool use and delegated execution. In parallel, NHI governance matters because the model is often operating as an identity-bearing workload, not a passive chatbot. The broader NHI problem set in the Ultimate Guide to NHIs — What are Non-Human Identities shows why identity, secrets, and privilege boundaries need to be treated as one control plane.
In practice, many security teams discover the indirect version only after a trusted workflow has already carried the malicious instruction into production execution.
How It Works in Practice
Direct prompt injection usually looks obvious in retrospect: a user tells the model to ignore prior instructions, reveal system context, or bypass policy. The attack lives in the same channel as the request, so detection can focus on the prompt boundary. Indirect prompt injection is more subtle. The malicious instruction is hidden in data the model is expected to trust, such as a support ticket, CSV row, web page, email, or retrieval result. Once the model retrieves or parses that content, the instruction may compete with the system prompt and influence downstream actions.
That difference changes the defence model. Teams should not rely on prompt filtering alone. They need context segregation, content provenance checks, tool-output sanitisation, and runtime policy evaluation that constrains what the agent can do after reading untrusted data. The emerging pattern is to combine workload identity, JIT credentials, and intent-based authorisation so the agent gets only the minimum access needed for the specific task. For implementation guidance, the OWASP Agentic AI Top 10 and the OWASP Agentic Applications Top 10 both point toward isolating tool execution from text interpretation. For identity grounding, NHIMG’s NHI reference is useful because the model’s privileges, tokens, and secrets must be governed like any other non-human workload.
- Use separate channels for instructions, retrieval data, and user content.
- Assume retrieved text may contain adversarial instructions unless proven otherwise.
- Issue short-lived credentials per task instead of reusing static secrets.
- Restrict tool scope to the current intent, not the agent’s broad capability set.
- Log retrieval provenance and execution decisions for later review.
These controls tend to break down when agents can call multiple tools asynchronously because malicious instructions can be reintroduced after an initial safety check.
Common Variations and Edge Cases
Tighter content controls often increase latency and operational overhead, so organisations have to balance safety against workflow friction. That tradeoff is real, especially in retrieval-heavy assistants and multi-agent pipelines where every extra validation step can slow task completion.
There is no universal standard for classifying all indirect prompt injection patterns yet, so current guidance suggests focusing on trust boundaries rather than trying to label every malicious string. A comment in a document, a field in a knowledge base, or a webhook payload may all become attack carriers if the agent interprets them as instructions. The hardest edge case is when the malicious content is not obviously executable text but still changes the agent’s reasoning enough to trigger unsafe tool use. That is why the OWASP Agentic AI Top 10 is best read alongside NHI controls, because the issue is both prompt integrity and privilege management. The agentic applications guidance also highlights that autonomous systems can chain trusted actions in ways humans do not anticipate. In practice, teams often miss indirect injection until a benign-looking knowledge source starts steering agents into unauthorised retrieval, exfiltration, or external action.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A2 | Directly addresses prompt injection and tool-abuse risks in agentic applications. |
| CSA MAESTRO | Covers runtime governance and trust boundaries for autonomous agents. | |
| NIST AI RMF | Supports governing AI risks across the lifecycle, including unsafe model behavior. |
Treat retrieved content as hostile, constrain tool use, and validate agent actions at runtime.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on May 31, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org