An attack where an adversary redirects an AI agent's objectives by manipulating its instructions, tool outputs, or external content — causing it to act outside its intended scope while appearing normal. The number one risk in the OWASP Top 10 for Agentic Applications 2026.
Expanded Definition
Agent Goal Hijack, also described as objective manipulation or instruction hijacking, occurs when an attacker steers an AI agent away from its intended task by altering prompts, tool outputs, retrieved content, or downstream instructions. The agent still appears functional, which makes the abuse hard to spot in real time. In current guidance, definitions vary across vendors, but the common pattern is the same: the agent’s execution authority is preserved while its decision-making is quietly redirected. That is why the term sits inside the OWASP Agentic AI Top 10 and is closely tied to control failures in NIST AI Risk Management Framework guidance. In NHI security, the risk is not just bad output. It is an agent using valid permissions, valid credentials, and valid tools to do the wrong thing. The most common misapplication is treating it like a simple prompt-injection issue, which occurs when organisations ignore tool access, retrieval trust, and authorization boundaries.
Examples and Use Cases
Implementing defenses against Agent Goal Hijack rigorously often introduces workflow friction, requiring organisations to weigh agent autonomy and speed against tighter validation and approval steps.
- An IT support agent reads a poisoned helpdesk ticket and starts resetting access for the wrong user group, showing how external content can override the original task.
- A code assistant follows malicious retrieval data and writes destructive changes into a repository, a pattern discussed in NHIMG’s Analysis of Claude Code Security.
- A procurement agent is redirected by altered vendor text and approves a payment exception outside policy, even though the execution path looks normal.
- An SOC agent consumes a tampered case note and escalates the wrong incident, which can compound response errors when OWASP NHI Top 10 style controls are missing.
- An autonomous browser agent follows hidden instructions in a webpage and exfiltrates data through an allowed API, matching the abuse patterns mapped in MITRE ATLAS adversarial AI threat matrix.
Why It Matters in NHI Security
Agent Goal Hijack matters because the agent’s privileges are usually legitimate even when the objective is not. That means traditional identity checks can pass while the action itself is unsafe. In practice, this creates a governance gap between identity, authorization, and intent. NHIs are already overexposed in many environments, and NHIMG reports that Ultimate Guide to NHIs — 2025 Outlook and Predictions notes 97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface. When that privilege is attached to an agent, a hijacked objective can become an enterprise-scale incident quickly. Defense requires strong tool gating, content provenance checks, human approval for sensitive actions, and monitoring aligned to OWASP Top 10 for Agentic Applications 2026 and the Anthropic AI-orchestrated cyber espionage campaign report. Organisations typically encounter the consequence only after an agent has already taken an unsafe action, at which point Agent Goal Hijack becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | OWASP ranks goal hijacking and instruction manipulation among core agentic AI threats. |
| OWASP Non-Human Identity Top 10 | NHI-02 | Hijacked agents often abuse secrets, tokens, and API keys under valid identity context. |
| NIST AI RMF | NIST AI RMF frames manipulative inputs as a trust, validity, and governance risk. |
Protect secrets, rotate credentials, and scope agent access to reduce abuse after goal changes.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on May 16, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org