Indirect prompt injection is an attack where malicious instructions are hidden inside content that an AI system reads later. The model may treat that content as context rather than as hostile input, which can influence tool use, data access, or workflow actions if controls are weak.
Expanded Definition
Indirect prompt injection is not a simple prompt-fuzzing issue. It occurs when an AI agent, chatbot, or tool-using model ingests untrusted content from email, documents, web pages, tickets, or retrieved knowledge and treats embedded instructions as operationally valid context. That makes the attack especially relevant in agentic systems where the model can call tools, read secrets, or take workflow actions. Industry usage is still evolving, and definitions vary across vendors, but the core risk is consistent: instructions hidden in content can override the task the operator intended. For a broader threat model, NHI teams often map this risk alongside the OWASP Agentic AI Top 10, especially where model output can influence privileged automation. The most common misapplication is assuming retrieval-augmented generation is safe by default, which occurs when untrusted source text is indexed without content validation or tool-scoping controls.
Examples and Use Cases
Implementing protections against indirect prompt injection often introduces friction, because stricter content filtering, retrieval controls, and tool gating can reduce model autonomy and increase review overhead.
- A support agent summarizes a customer ticket that contains hidden instructions to reveal account data. The model follows the injected directive unless the retrieval layer strips or isolates untrusted text.
- An internal agent reads a wiki page that includes malicious text designed to trigger a privileged API call. Teams studying the OWASP Agentic Applications Top 10 usually place this in the same class as tool-abuse and trust-boundary failures.
- A browser-enabled agent visits a page that contains prompt injection payloads disguised as comments or metadata, then attempts to exfiltrate session data through an approved connector.
- A document-processing workflow summarizes vendor contracts and inadvertently acts on embedded instructions to alter routing, escalate access, or suppress alerts.
- A knowledge agent ingests a public webpage with instructions targeting the system prompt, showing why the OWASP Agentic AI Top 10 treats prompt injection as an operational threat, not just a content-safety concern.
Why It Matters in NHI Security
Indirect prompt injection matters because it can turn ordinary content into a control plane for misuse. In NHI environments, that means an AI agent may be manipulated into using service accounts, API keys, or delegated permissions in ways no human operator approved. The risk compounds when secrets are exposed broadly, since OWASP Agentic Applications Top 10 highlights the need to treat tool access, memory, and retrieval as separate trust boundaries. NHI Mgmt Group research shows that NHI Mgmt Group reports 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, which is why this attack matters so quickly in practice. A compromised prompt path can become a credential path if the agent can read, relay, or misuse secrets. Organisations typically encounter the impact only after a poisoned document, workflow, or connector produces an unexpected action, at which point indirect prompt injection becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | AI2 | Covers prompt injection and unsafe tool execution in agentic systems. |
| OWASP Non-Human Identity Top 10 | NHI-05 | Maps to agent trust-boundary failures that expose secrets or privileged actions. |
| NIST AI RMF | Addresses AI risks from unreliable inputs and harmful system outputs. |
Assess prompt-injection exposure and apply layered controls for govern, map, measure, and manage.
Related resources from NHI Mgmt Group
- How should security teams reduce indirect prompt injection risk in AI systems?
- When does indirect prompt injection become a business risk rather than a technical curiosity?
- What is the difference between prompt injection risk and identity abuse in agents?
- What is the difference between prompt injection and credential theft for agents
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on May 26, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org