A malicious instruction hidden inside data the AI system trusts, such as retrieved documents, tool output, or memory. The system later treats that data as guidance and follows the embedded command, which makes the payload dangerous because the harmful step happens after ingestion.
Expanded Definition
Indirect injection is a prompt or instruction attack that arrives through trusted data rather than through a direct user message. In NHI and agentic AI environments, that data may come from retrieved documents, API responses, ticketing notes, shared memory, or tool output, and the system later treats it as operational guidance. The risk is not the payload’s visibility at ingestion, but its influence after the model or agent reuses the content. This is why indirect injection is closely tied to retrieval-augmented generation, tool-using agents, and workflows that blend untrusted content with execution authority.
The term is still evolving across vendors, but the core security concern is consistent: the system fails to separate data from instructions. That makes it different from ordinary content poisoning because the attack depends on downstream interpretation, not just corrupted input. NHI Management Group treats indirect injection as a governance issue as much as a technical one, because the affected trust boundary often sits between identity, retrieval, and action. For a broader control context, the NIST Cybersecurity Framework 2.0 reinforces the need to manage data flows and protect decision pathways, even though it does not name this attack class directly. The most common misapplication is assuming a source is safe simply because it is authenticated, which occurs when trusted systems ingest unverified instructions embedded in otherwise legitimate content.
Examples and Use Cases
Implementing protections against indirect injection rigorously often introduces friction, requiring organisations to weigh agent autonomy and retrieval quality against stricter filtering, provenance checks, and execution controls.
- A support chatbot retrieves an internal knowledge article that contains hidden instructions to reveal sensitive account data, causing the agent to follow attacker-controlled guidance.
- A workflow agent reads tool output from a downstream service and treats embedded commands as task updates instead of data, leading to unsafe API calls.
- A memory-enabled assistant stores a malicious note in long-term context, then later executes an action because the note is reused as a planning hint.
- A document summarisation system ingests a seemingly harmless file with covert instructions, similar to the retrieval and trust issues discussed in Ultimate Guide to NHIs.
- An enterprise agent that uses external integrations follows injected content from a partner feed, highlighting why standards-based identity and access design, such as the NIST Cybersecurity Framework 2.0, must be paired with content trust controls.
These examples show why indirect injection is most dangerous when systems combine broad retrieval, persistent memory, and execution authority without a clear boundary between trusted instructions and untrusted data.
Why It Matters in NHI Security
Indirect injection matters because NHI and agentic systems often act with credentials, tokens, and tool permissions that make a single confused instruction operationally significant. NHI Management Group notes that Ultimate Guide to NHIs reports 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, which shows how quickly a trust failure can become an access failure. When an injected instruction is delivered through retrieved content, the compromise may look like normal automation, making detection harder than a classic malicious login attempt.
Security teams need to understand indirect injection because it breaks assumptions behind least privilege, provenance, and separation of duties. The practical defense is not only content filtering, but also constraining which tools an agent can call, validating instructions against policy, and treating external or user-originated text as untrusted even after it enters memory or retrieval layers. This becomes especially important for delegated workflows where one agent’s output becomes another agent’s input. Organisations typically encounter the consequence only after an agent has already sent data, changed records, or invoked a tool on the basis of poisoned context, at which point indirect injection becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Covers prompt injection and agent abuse patterns relevant to indirect injection. | |
| NIST CSF 2.0 | PR.DS | Addresses data integrity and trust in the information an AI system consumes. |
| NIST AI RMF | AI RMF focuses on managing system risks from manipulated inputs and unsafe outputs. |
Assess indirect injection as an AI trust risk and monitor for harmful downstream effects.
Related resources from NHI Mgmt Group
- How should security teams reduce indirect prompt injection risk in AI systems?
- When does indirect prompt injection become a business risk rather than a technical curiosity?
- Why is indirect prompt injection harder to defend than XSS?
- What is the difference between prompt injection and indirect prompt injection?