Prompt injection taxonomy shows the gap between intent and technique

By NHI Mgmt Group Editorial TeamPublished 2026-05-13Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: Prompt injection is best standardised by separating attacker intent from execution technique, according to Lasso Security, which maps text-based attacks across instruction override, role-playing, context, formatting, cross-lingual, social engineering, encoding, payload splitting, and instruction smuggling. That distinction matters because AI-enabled workflows now need governance for how prompts are manipulated, not just what the model is asked to do.

At a glance

What this is: This is a prompt injection taxonomy that separates attack intent from technique and shows how text-based manipulation techniques evade LLM safeguards.

Why it matters: It matters because security teams governing AI agents, AI-assisted workflows, and adjacent NHI controls need to classify malicious model interaction patterns before they can detect, contain, or audit them.

👉 Read Lasso Security's full prompt injection taxonomy and technique breakdown

Context

Prompt injection is a manipulation problem, not just a content-filter problem. The attacker is not only asking an LLM to do something harmful, but also shaping the prompt so the model interprets the request as legitimate, ambiguous, or contextually authorised. For identity teams, that shifts the control question from input sanitisation alone to how runtime instructions, permissions, and model-mediated actions are governed.

The article is centered on text-based techniques, which are especially relevant when LLMs sit inside applications, copilots, and agentic workflows that already have access to tools or data. That is where prompt handling becomes an identity concern, because the model may be the decision layer that turns text into action. In those environments, AI agent governance and NHI controls start to overlap.

Key questions

Q: How should security teams classify prompt injection attempts in AI workflows?

A: Security teams should classify prompt injection by both attacker intent and the technique used to achieve it. That means separating goals such as jailbreak or system prompt leakage from methods such as role-playing, formatting abuse, cross-lingual manipulation, or instruction smuggling. This approach improves detection, routing, and incident triage because the same harmful objective can appear in many different forms.

Q: Why do prompt injection attacks create governance risk for AI agents?

A: Prompt injection creates governance risk because the model often sits in the control path between text input and tool execution. If attackers can change what the model treats as authoritative, they can influence access decisions, data exposure, or downstream actions without compromising a traditional account. That makes prompt provenance and instruction hierarchy part of AI identity governance.

Q: What do organisations get wrong about filtering malicious prompts?

A: Many organisations focus on obvious harmful wording and miss manipulative structure. Attackers can hide intent through obfuscation, mixed languages, encoding, spacing, or embedded instructions inside content the model is asked to process. Effective defence requires normalisation and context-aware inspection, not just keyword blocking.

Q: How can teams reduce the impact of instruction smuggling in LLM pipelines?

A: Teams should treat retrieved content as untrusted data, even when it comes from HTML pages, documents, or chat history. Hidden instructions in comments, metadata, or non-rendered elements can still influence model behaviour if the pipeline passes them through unchanged. Sanitising content before inference reduces the chance that the model confuses data with commands.

Technical breakdown

Intent vs technique in prompt injection

Prompt injection is easiest to understand when intent and technique are separated. Intent is the attacker’s objective, such as system prompt leakage or jailbreak. Technique is the method used to increase the chance of success, such as obfuscation, role-playing, or context manipulation. That distinction matters because the same technique can support benign or malicious use, while the same malicious objective can be pursued through many different text patterns. Once defenders treat the prompt itself as an attack surface, detection has to look for manipulation patterns, not just prohibited keywords.

Practical implication: classify prompt attacks by objective and method so detection, logging, and response rules can follow the real abuse pattern.

Instruction override and context exploitation

Instruction override techniques try to displace the model’s governing instructions by claiming that prior rules no longer apply, while context exploitation tries to reshape the conversation so the model believes the attacker has authority or a better source of truth. These are different failure modes. One attacks precedence, the other attacks interpretation. In identity terms, both are attempts to confuse which instructions are authoritative inside the session. That is why prompt chains need provenance, instruction hierarchy, and careful separation between trusted system content and untrusted user-supplied context.

Practical implication: keep system instructions isolated from user content and preserve provenance for any prompt elements that can influence tool use or refusal behaviour.

Formatting, cross-lingual, and encoding-based evasion

Many prompt injection techniques do not rely on obviously malicious wording. They use spacing, Unicode, mixed scripts, leet speak, language switching, translation framing, or reversible encodings to make the same malicious intent harder for filters to recognise. The important technical point is that the payload remains semantically dangerous even when the surface form changes. That means defenders need normalisation, decoding-aware inspection, and multilingual controls that operate before the model interprets the content. Otherwise, the model sees one thing while the filter sees another.

Practical implication: normalise, decode, and inspect prompts before inference, especially when user input can include multilingual or encoded text.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is an identity problem because the model becomes the decision layer that interprets authority. Once the LLM is embedded in workflows that can retrieve data, call tools, or shape downstream actions, the attack is no longer just abusive text. The governing question becomes which instructions the system is permitted to treat as authoritative, and when that authority can be spoofed through language alone. Practitioners should treat prompt provenance as part of access control.

Technique-based classification is the missing bridge between content security and AI governance. Security teams already know how to reason about obfuscation, social engineering, and payload smuggling in other domains, but prompt injection pulls those ideas into the model layer. A taxonomy that separates intent from technique creates a common language for policy, detection, and incident response. The practical result is that AI security programmes can stop treating every prompt issue as the same class of failure.

Instruction smuggling is a named failure mode that traditional prompt filters miss. The article’s taxonomy shows that the harmful directive may be embedded in content the model is asked to process, not in a direct command. That breaks the assumption that “data” and “instructions” are naturally separable once content reaches the model. The implication is that retrieval, summarisation, and HTML-parsing pipelines need governance over embedded directives, not just moderation of visible prompts.

Prompt injection expands NHI governance from credential control to context control. The core risk is not only who or what has a token, but what the model is allowed to treat as a trusted instruction source during runtime. That widens the identity perimeter around AI-enabled systems and makes prompt handling part of the trust chain. Practitioners should align AI workflow governance with NHI controls on runtime authority and tool execution.

Universal language attacks sharpen the need for multilingual model governance. The taxonomy shows that attackers can preserve meaning while changing script, language, or encoding, which means language-based safety checks cannot rely on surface form alone. This is a programme-level issue for any enterprise deploying AI across regions or multilingual user populations. Security teams should assume that prompt abuse can cross language boundaries as easily as it crosses technical ones.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
Prompt governance will matter more as agent estates grow, which is why practitioners should also review OWASP Agentic AI Top 10 alongside their AI control design.

What this signals

Prompt taxonomies will only be useful if they are embedded into control design, not left as documentation. As AI systems become normalised in business workflows, the operational burden shifts toward prompt provenance, instruction hierarchy, and multilingual inspection. Teams that already treat untrusted input as a control problem should extend that mindset to LLM interfaces and tool-calling paths.

With 98% of companies planning to deploy more AI agents within the next 12 months, according to AI Agents: The New Attack Surface report, the prompt layer is becoming a durable part of enterprise attack surface management. That makes AI workflow governance a standing programme concern, not a one-off review.

For organisations that need a control framework to anchor this work, the OWASP Agentic AI Top 10 and NIST guidance on AI risk management provide a useful starting point for policy, testing, and exception handling.

For practitioners

Define prompt injection categories in policy Map local detection and response rules to intent, technique, and payload style so analysts can distinguish jailbreak attempts, system prompt leakage, and instruction smuggling during triage.
Isolate trusted system instructions from user input Keep governing prompts, tool instructions, and refusal rules separate from untrusted text so attackers cannot use context manipulation to overwrite authoritative session state.
Normalise multilingual and encoded content before inference Apply Unicode normalisation, script checks, decoding, and translation-aware inspection to reduce evasion through homoglyphs, mixed scripts, leet speak, or base encodings.
Treat HTML and other retrieved content as untrusted data Strip or sandbox hidden instructions in comments, metadata, and non-rendered elements before the model processes retrieved pages, summaries, or documents.

Key takeaways

Prompt injection is better managed as an authority and interpretation problem than as a simple content-filter problem.
Separating attacker intent from technique gives security teams a practical taxonomy for detection, triage, and policy design.
Enterprise AI programmes need prompt provenance, normalisation, and embedded-instruction controls before agent estates expand further.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Covers prompt injection, tool misuse, and agent hijacking.
NIST AI RMF		AI risk governance is needed for LLM workflows that can act on manipulated prompts.
NIST CSF 2.0	PR.DS-6	Data integrity controls matter when prompts and retrieved content can be altered.

Use agentic AI controls to validate prompts, constrain tool use, and separate user text from governing instructions.

Key terms

Prompt Injection: A prompt injection is an attempt to manipulate an LLM into ignoring, reinterpreting, or bypassing its intended instructions. In practice, it exploits how the model resolves authority between system prompts, user input, and retrieved content, turning language into an attack path.
System Prompt: The system prompt is the core instruction set that defines an LLM’s behaviour, boundaries, and response style during a session. When attackers influence or override it, they are not merely changing text. They are trying to change the model’s governing authority.
Instruction Smuggling: Instruction smuggling is the practice of hiding malicious directives inside content the model is asked to process as data. The dangerous part is the boundary violation, where comments, metadata, markup, or embedded text are treated as instructions rather than untrusted input.
Context Exploitation: Context exploitation is a prompt attack method that reshapes the conversation so the model believes false authority, false capabilities, or false history. For defenders, it is a reminder that context is part of the trust surface, not just background text.

Deepen your knowledge

Prompt injection taxonomy and AI workflow governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for LLM-driven tools, copilots, or agents, it is a useful place to start.

This post draws on content published by Lasso Security: A Standardization Guide to Prompt Injection, text-based techniques vs intent. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-13.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org