A failure mode where an AI agent is gradually steered across a conversation or workflow until it performs an action outside its intended purpose. The risk is not a single bad prompt, but the accumulation of context that alters the agent’s behaviour at runtime.
Expanded Definition
Agentic hijacking describes a runtime control failure in which an AI agent is nudged, step by step, into operating outside its intended scope. Unlike a one-shot jailbreak, the attack works through accumulated context, social engineering, tool abuse, or poisoned workflow inputs that reshape the agent’s decision path over time. In NHI and agentic ai governance, this matters because the agent is not merely generating text; it is often authorized to retrieve data, invoke APIs, move tickets, or trigger business actions. Industry usage is still evolving, but the core idea is consistent across OWASP Agentic AI Top 10, NIST AI Risk Management Framework, and CSA MAESTRO: the agent’s permissions remain valid while its intent becomes compromised through interaction history.
The concept overlaps with prompt injection, tool misuse, and workflow abuse, but agentic hijacking is broader because it focuses on the cumulative steering of an autonomous actor across multiple turns or tasks. NHIMG’s guidance on the OWASP NHI Top 10 and its AI LLM hijack breach coverage both show that the real hazard is control drift inside an otherwise legitimate session. The most common misapplication is treating it as a single malicious prompt, which occurs when teams ignore how memory, retained context, and delegated tool access can compound over time.
Examples and Use Cases
Implementing guardrails against agentic hijacking often introduces friction, requiring organisations to balance autonomy and user experience against tighter approval gates, shorter context windows, and more frequent human review.
- An employee support agent is steered over several exchanges into exposing internal policy details, then uses its own approved connectors to retrieve data it should never have surfaced.
- A coding agent begins with a benign refactor request, but successive instructions redirect it toward modifying deployment scripts and approving unsafe package changes, a pattern discussed in NHIMG’s Analysis of Claude Code Security.
- An operations agent is socially engineered in chat to “help verify” credentials, then uses its tool access to query secrets or configuration data beyond the original ticket scope, aligning with the threat patterns in the OWASP Top 10 for Agentic Applications 2026.
- A procurement or finance agent is gradually convinced to reroute an approval workflow, creating an unauthorized action chain that appears legitimate because each step is individually permitted.
- Security teams use this term when an autonomous agent crosses from assistive behavior into delegated action after accumulated conversational manipulation, especially when session memory persists across tasks.
For broader breach context, NHIMG’s LLMjacking report and Ultimate Guide to NHIs help distinguish runtime hijack behavior from static credential theft, which are related but not identical failure modes.
Why It Matters in NHI Security
Agentic hijacking becomes an NHI security problem because the agent’s identity, tokens, and delegated permissions are what make the manipulated action possible. Once the agent is inside a trusted session, standard access controls may still authorize the final API call even though the intent has been subverted. That is why governance must pair identity controls with contextual monitoring, tool allowlisting, action gating, and memory hygiene. NHIMG’s reporting on AI LLM hijack breach underscores how quickly a seemingly normal workflow can become an attack path when an AI agent is trusted to act on its own.
SailPoint’s AI Agents: The New Attack Surface report found that 80% of organisations report AI agents have already performed actions beyond their intended scope, and 33% report access to inappropriate or sensitive data. That is a governance signal, not just a product issue. Defenders should use NIST AI Risk Management Framework practices to assess exposure, then map agent actions to the MITRE ATLAS adversarial AI threat matrix and CSA MAESTRO agentic AI threat modeling framework. Organisations typically encounter agentic hijacking only after an agent has already sent data, changed a record, or triggered an unexpected workflow, at which point the term becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Covers prompt and tool abuse that steers agents into unsafe actions. |
| OWASP Non-Human Identity Top 10 | NHI-07 | Addresses abuse of delegated NHI permissions and runtime session trust. |
| NIST AI RMF | Frames AI risks that emerge from misuse, control failure, and unsafe operation. |
Monitor agent behavior continuously and document escalation, containment, and rollback steps.