A manipulation pattern where an agent is influenced through third-party text, shared workflows, or surrounding content rather than a direct user command. The risk is that the agent accepts external context as operational guidance, causing it to follow unsafe actions without an obvious malicious prompt.
Expanded Definition
Indirect instruction occurs when an AI agent takes action based on context it did not receive from the immediate operator, such as email text, shared documents, tickets, chat threads, or workflow metadata. In NHI security, that matters because the agent often has execution authority and can turn ambient content into real-world changes.
This pattern is different from a direct prompt injection because the influence is mediated through surrounding material rather than an obvious malicious command. Definitions vary across vendors, but the security concern is consistent: the agent may treat untrusted third-party text as if it were policy, instructions, or task context. That creates a governance gap between what a person intended and what the system actually executes. For a broader NHI governance lens, the Ultimate Guide to NHIs frames how automation, identity, and privileged access must be controlled across the full lifecycle, while NIST Cybersecurity Framework 2.0 provides a practical language for identifying and reducing such operational risk.
The most common misapplication is treating all retrieved or shared content as trustworthy task input, which occurs when an agent is allowed to read unvalidated third-party text without strict context separation.
Examples and Use Cases
Implementing protections against indirect instruction often introduces friction, because the system must distinguish useful context from adversarial context without blocking legitimate work.
- An agent summarises an inbox thread and then follows a hidden instruction embedded in a vendor message, causing it to draft an approval or expose data outside the intended workflow.
- A support agent reads a ticket containing attacker-controlled text and uses that text to trigger a reset, export, or escalation action through an attached NHI.
- A document-processing agent ingests a shared file and interprets commentary or metadata as operational guidance, even though the file was never meant to steer execution.
- A code assistant connected to repositories and CI context treats comments or README content as authority, then proposes unsafe secret handling or deployment changes.
- A workflow agent in a procurement or finance process follows instructions hidden in a third-party attachment, causing unauthorised data movement or approval routing.
These scenarios align with the broader NHI exposure patterns described in the Ultimate Guide to NHIs, where third-party exposure and excessive privilege amplify the blast radius. The same control problem is why implementation guidance in the NIST Cybersecurity Framework 2.0 emphasizes asset awareness, access governance, and protective safeguards.
Why It Matters in NHI Security
Indirect instruction is dangerous because it exploits the gap between identity and intent. An NHI can authenticate correctly and still behave unsafely if its context channel is compromised. That means the issue is not only access control, but also instruction provenance, context validation, and execution boundaries. When agents are connected to messaging, repositories, tickets, or document stores, the attack surface expands beyond secrets and credentials into the content the agent consumes.
This is especially relevant in environments where NHIs already carry excessive privilege. NHI Mgmt Group reports that Ultimate Guide to NHIs found 97% of NHIs carry excessive privileges, which turns a single deceptive instruction into a much larger operational event. Controls from NIST Cybersecurity Framework 2.0 help organisations map the problem to governance, protection, detection, and response, but no single standard yet fully resolves indirect instruction risk for agentic systems.
Organisations typically encounter the consequence only after an agent has already forwarded data, changed records, or invoked a privileged tool on the basis of hostile surrounding content, at which point indirect instruction becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | LLM-02 | Indirect instruction is a context-injection risk for autonomous agents. |
| OWASP Non-Human Identity Top 10 | NHI-02 | Agents acting on hostile context often fail because secrets and execution paths are loosely governed. |
| NIST CSF 2.0 | PR.AC-4 | Least privilege reduces the blast radius when agents misread third-party content as instructions. |
Restrict NHI permissions so a mistaken context interpretation cannot trigger broad access or changes.
Related resources from NHI Mgmt Group
- How should security teams reduce indirect prompt injection risk in AI systems?
- When does indirect prompt injection become a business risk rather than a technical curiosity?
- Why do indirect prompt injections matter for IAM and NHI governance?
- Why is indirect prompt injection harder to defend than XSS?
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org