How should security teams defend LLMs against tokenization attacks?

Why This Matters for Security Teams

Tokenization attacks exploit the gap between what humans see and what the model actually processes. That makes them especially dangerous for LLMs used in moderation, routing, policy checks, and data handling, because a prompt can look harmless at the text layer while decomposing into something very different at the token layer. Security teams often miss this when they validate only the rendered string or only the model output.

The practical risk is inconsistent decision-making across the pipeline. A sanitizer may strip one pattern, a classifier may score another, and the model may still receive a payload that preserves the attacker’s intent after tokenization. Current guidance from the OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both point toward layered controls, but tokenization-specific testing is still uneven in many programs. NHI Management Group has documented how adversaries exploit identity and input path weaknesses across AI systems in the AI LLM hijack breach analysis.

In practice, many security teams encounter tokenization bypass only after a harmful prompt has already reached production routing or moderation logic.

How It Works in Practice

Defence starts by forcing the entire pipeline to agree on the same canonical input before any security decision is made. That means normalising Unicode, collapsing visually confusable characters where appropriate, rejecting invisible control characters in high-risk contexts, and ensuring that preprocessing happens before both policy enforcement and model invocation. The key point is not just to filter text, but to eliminate representation gaps that attackers can use to split one string into multiple interpretations.

Testing should cover both raw text and tokenized output. A payload may be benign as a sequence of characters but dangerous once byte-pair encoding or another tokenizer segments it into unusual token boundaries. The model, classifier, and sanitizer must be validated against the same sample set, using the same tokenizer and the same canonicalisation logic. That is especially important when LLMs are embedded in agentic workflows or tool-using systems, where a bypass can become a tool call, data exfiltration, or unauthorized instruction chaining. The OWASP Top 10 for Agentic Applications 2026 and CSA MAESTRO agentic AI threat modeling framework both reinforce the need to test the full interaction path, not just the prompt surface.

Canonicalize input first, then run policy checks on the normalized representation.

Apply the same tokenizer in test harnesses that production uses.

Block or flag invisible characters, homoglyphs, and mixed-script payloads where business context allows.

Log both raw and normalized forms for investigation, but avoid exposing sensitive content broadly.

NHI Management Group’s 52 NHI Breaches Analysis shows that attacker paths often combine input abuse with identity abuse, which is why tokenization testing should be paired with downstream authorization review. These controls tend to break down when the application allows multiple tokenizers, locale-dependent preprocessing, or user-controlled markup because the security decision and the model decision no longer share the same input state.

Common Variations and Edge Cases

Tighter input normalization often increases false positives and support overhead, requiring organisations to balance bypass resistance against usability and multilingual coverage. That tradeoff matters because some deployments must preserve accented text, code snippets, or non-Latin scripts while still resisting adversarial encoding tricks.

Best practice is evolving, and there is no universal standard for how aggressively to strip or transform uncommon Unicode ranges. For customer-facing systems, a risk-based approach is usually better: preserve legitimate content where needed, but apply stricter canonicalization to system prompts, routing prompts, policy prompts, and tool invocation fields. Where the application supports retrieval or long-context injection, tokenization attacks can also be used to bury malicious instructions inside large payloads that look harmless during manual review. The The 52 NHI breaches Report is useful for understanding how identity abuse and content abuse often converge in real incidents, while NIST AI 600-1 Generative AI Profile supports a stronger evaluation and monitoring posture.

Another edge case is when downstream tools parse model output as structured data. If the model’s token view differs from the parser’s text view, an attacker may create a payload that passes one layer and triggers another. Teams should treat tokenization attacks as a pipeline integrity problem, not just a prompt-filtering problem.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Tokenization bypasses prompt and tool defenses in agentic workflows.
CSA MAESTRO	TR-2	Threat modeling must include adversarial input and parser mismatch paths.
NIST AI RMF	GOVERN	AI governance requires consistent evaluation and monitoring of input risks.

Model tokenization abuse in your agent threat models and validate preprocessing controls.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams defend LLMs against tokenization attacks?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group