An attack where malicious instructions are embedded in content that an AI agent reads — causing the agent to execute unintended actions using its own legitimate credentials. A primary vector for agent goal hijacking and identity abuse.
Expanded Definition
Prompt injection (agentic) is a control-bypass technique that targets an AI agent’s instruction hierarchy. Instead of attacking the model directly, the adversary places hostile directives in text, files, web pages, tickets, emails, or tool outputs that the agent is expected to read. The agent then treats the malicious content as actionable context and may comply using legitimate privileges, tool access, or delegated workflow authority. In NHI security, this matters because the harmed asset is often the agent’s execution identity, not just the model output. Guidance is still evolving across vendors, but the practical definition aligns with the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework, both of which emphasize context integrity, misuse resistance, and governance over autonomous action.
For NHI teams, prompt injection sits close to identity abuse, tool misuse, and data exfiltration. It differs from ordinary phishing because the attack is mediated by machine interpretation, and it differs from traditional code injection because the payload targets reasoning and instruction-following behavior rather than a parser. The most common misapplication is treating prompt injection as only a model-safety issue, which occurs when teams ignore the agent’s permissions, data sources, and tool chain.
Examples and Use Cases
Implementing prompt-injection defenses rigorously often introduces more review points and tighter content handling, requiring organisations to weigh agent autonomy against the cost of filtering, policy enforcement, and human approval steps.
- An AI support agent reads a customer message that includes hidden instructions to export case notes, then uses its valid CRM credentials to pull sensitive records.
- A procurement agent ingests a vendor PDF that embeds instructions to change the payment destination, creating a fraud path through normal workflow execution.
- A code assistant consumes malicious repository comments that instruct it to reveal secrets or alter files, a pattern discussed in NHIMG’s Analysis of Claude Code Security.
- An internal research agent follows instructions embedded in a web page and sends proprietary data to an external endpoint, matching the agentic threat patterns described in the OWASP NHI Top 10.
- A security operations agent ingests an alert payload containing adversarial text and opens tickets, resets credentials, or contacts third-party APIs without intended review.
These scenarios are easiest to overlook when the content source looks routine and the agent appears to be “just summarising” text. That is why controls must examine both content provenance and execution authority, not only model prompts.
Why It Matters in NHI Security
Prompt injection becomes an NHI issue when an agent’s permissions let hostile instructions turn interpretation into action. In SailPoint’s AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already performed actions beyond intended scope, while only 44% had implemented policies to govern them. That gap shows why prompt injection is not theoretical: it is a practical route to overreach, data exposure, and unauthorized tool use. The problem is amplified when agents can access Secrets, call MCP-connected tools, or operate under broad RBAC without JIT approval and ZSP enforcement.
Defence requires layered controls: restrict what an agent can read, isolate untrusted content, validate tool requests, and separate the agent’s reasoning context from executable commands. The threat is also visible in broader research such as the AI LLM hijack breach analysis, where exposed credentials can be abused within minutes, and in external frameworks like MITRE ATLAS adversarial AI threat matrix and the Anthropic first AI-orchestrated cyber espionage campaign report. Organisations typically encounter prompt injection only after an agent has already sent data, changed a record, or executed a tool call it should not have, at which point the term becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Agent prompt and tool manipulation are core risks in the agentic attack taxonomy. |
| OWASP Non-Human Identity Top 10 | NHI-02 | Prompt injection often succeeds by abusing secrets and over-privileged non-human identities. |
| NIST AI RMF | The framework calls for managing validity, robustness, and harmful manipulation in AI systems. |
Limit NHI access, protect secrets, and require explicit approval before agents can act on sensitive data.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on May 16, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org