Intent collision occurs when an agent merges legitimate user intent with attacker-controlled instructions into one execution plan. The resulting action sequence may look coherent to the model, but it is actually the product of mixed trust sources and therefore unsafe to treat as a valid task.
Expanded Definition
Intent collision is a trust-boundary failure inside an AI agent’s execution plan. The agent receives a legitimate task from an authorised user, but attacker-controlled text, tool output, or retrieved content gets merged into the same plan as if it were equally trusted. In agentic systems, that distinction matters because the model can produce a coherent sequence while still acting on mixed provenance.
Definitions vary across vendors because some teams describe this as prompt injection, while others treat it as a broader planning integrity problem. NIST’s NIST Cybersecurity Framework 2.0 does not name the term directly, but its governance and protective outcomes map well to separating trusted instructions from untrusted content. For NHI operators, the key issue is not whether the agent sounded reasonable, but whether the instruction source was authorised. NHI Management Group’s Ultimate Guide to NHIs is useful here because the same trust discipline that protects service accounts and secrets also applies to agent decision paths.
The most common misapplication is assuming any output that matches the user’s goal is safe, which occurs when attacker text is blended into the same context window as approved instructions.
Examples and Use Cases
Implementing intent-collision defenses rigorously often introduces extra context filtering and approval steps, requiring organisations to weigh faster automation against tighter control of what an agent is allowed to execute.
- An agent summarises a helpdesk ticket, but a malicious customer message adds “reset admin credentials” as if it were part of the request.
- A retrieval-augmented workflow pulls policy text and an attacker-edited document together, then the agent treats both as equally valid instructions.
- A coding agent receives a safe feature request, but comments inside a repository issue instruct it to exfiltrate Ultimate Guide to NHIs-style secrets handling data into logs.
- An IT automation agent is asked to rotate credentials, yet a tool response injects a second command that expands access before rotation completes.
- A SOC assistant ingests an alert, but adversarial text in the ticket causes it to suppress the incident instead of escalating it.
These cases matter because the agent is not simply “confused”; it has composed one plan from two trust levels. That is why NIST Cybersecurity Framework 2.0 style access governance should be paired with instruction provenance checks, especially where tool use can trigger real-world side effects.
Why It Matters in NHI Security
Intent collision becomes a serious NHI risk when agents can reach APIs, vaults, CI/CD systems, or privileged workflows. If attacker-controlled content is accepted as part of a valid task, the agent may disclose secrets, approve actions it should not take, or misuse a Non-Human Identity that already has broad permissions. That is especially dangerous because NHI compromise often looks like normal automation until damage is visible. In the Ultimate Guide to NHIs, NHI Management Group notes that 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, which shows how quickly a planning error can become a credential incident.
This is why practitioners should treat intent collision as both an AI governance issue and an access-control issue. Zero trust thinking, least privilege, and explicit approval gates help, but they must be applied to the agent’s reasoning pipeline, not only to human users. The operational lesson is simple: if an agent can act on mixed-trust input, the blast radius belongs to the NHI attached to that workflow. Organisations typically encounter the consequence only after a secret is exposed or an unauthorized action is executed, at which point intent collision becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agentic AI guidance addresses instruction mixing and tool misuse risks. | |
| OWASP Non-Human Identity Top 10 | NHI-04 | Covers NHI misuse paths where automation acts on unsafe or mixed trust input. |
| NIST Zero Trust (SP 800-207) | PL-8 | Zero Trust requires explicit trust decisions and verification for every action path. |
Constrain NHI-powered workflows so untrusted content cannot trigger privileged actions.
Related resources from NHI Mgmt Group
- What is the difference between logging actions and logging intent for AI agents?
- What is the difference between role-based access and intent-based access for agents?
- What is the difference between RBAC and intent-aware access for autonomous workflows?
- What is the difference between access control and intent governance for AI agents?