Subscribe to the Non-Human & AI Identity Journal
Home FAQ Threats, Abuse & Incident Response Why do multilingual prompts increase the risk of…
Threats, Abuse & Incident Response

Why do multilingual prompts increase the risk of AI data leakage?

← Back to all FAQ
By NHI Mgmt Group Editorial Team Updated July 5, 2026 Domain: Threats, Abuse & Incident Response

Multilingual prompts increase leakage risk because safety controls often recognise restricted requests more reliably in English than in translated or transliterated forms. Attackers can use that gap to elicit sensitive data, especially where the model has access to internal documents or connected tools. Consistent enforcement across languages is essential when AI handles confidential information.

Why This Matters for Security Teams

Multilingual prompts matter because safety controls are usually trained and tuned around the languages they see most often, then assumed to generalise. That assumption breaks when a user rephrases a restricted request in another language, transliterates it, or mixes languages inside a single prompt. The risk is not only policy evasion. It is also accidental disclosure when a model can retrieve internal text, summarise confidential content, or act through connected tools.

This is especially relevant for organisations handling sensitive prompts, customer data, or internal knowledge bases. NHI Management Group has repeatedly highlighted how inconsistent identity and secrets handling creates exposure in real environments, including in Ultimate Guide to NHIs — Key Challenges and Risks and the Guide to the Secret Sprawl Challenge. When language coverage is uneven, an attacker does not need advanced model exploitation, only the ability to ask the same question in a form the guardrail fails to recognise. In practice, many security teams discover this only after a multilingual prompt has already bypassed review and exposed content that was never meant to leave the system.

How It Works in Practice

The failure usually starts in the control layer, not the model. A prompt filter, classifier, or policy engine may be strongest in English, while its confidence drops in translated or mixed-language inputs. The model then receives a request that appears benign to the filter but still conveys the same intent. This is why multilingual leakage is often a policy-evasion problem first, and a model problem second.

In systems with retrieval, plugins, or file access, the impact grows quickly. A translated prompt can instruct the model to summarise documents, extract names, or reveal internal instructions hidden in a knowledge base. If the agent has tool access, it may also chain actions across languages, moving from chat to search to document retrieval to export. That is one reason the current guidance from NIST AI risk management work and the NIST Cybersecurity Framework 2.0 emphasises continuous governance rather than one-time content filtering.

Practitioners reduce risk by treating language as a security variable:

  • Apply multilingual safety evaluation, not just English-only prompt tests.
  • Normalise or detect language before policy decisions, while preserving meaning.
  • Use retrieval and tool permissions that are separate from chat safety checks.
  • Classify and redact sensitive outputs before they are returned to the user.
  • Log prompts in a way that supports review across scripts, transliterations, and code-mixed text.

NHIMG research on The 52 NHI breaches Report and the broader 2024 ESG Report: Managing Non-Human Identities shows why this matters operationally: once an identity or control gap exists, attackers look for the easiest path through it. These controls tend to break down when an LLM is connected to internal search, translation, or document workflows because the system can expose the same sensitive object through multiple language paths.

Common Variations and Edge Cases

Tighter multilingual filtering often increases false positives and operational overhead, requiring organisations to balance leakage prevention against usability. That tradeoff is real, especially for global teams and customer-facing assistants.

Best practice is evolving, and there is no universal standard for this yet. Some organisations rely on translation before moderation, while others moderate in the original language and again after normalisation. Both approaches can miss nuance, slang, or culturally specific phrasing. Mixed-language prompts are especially difficult because the risky clause may appear only in one segment while the rest of the message looks harmless.

The hardest edge cases usually involve:

  • code-switching between languages in one request
  • transliterated words that bypass keyword-based filters
  • low-resource languages with weaker moderation coverage
  • embedded instructions inside quoted text, images, or retrieved documents
  • systems that expose internal knowledge through summarisation rather than direct retrieval

Current guidance suggests testing safety controls with real multilingual abuse cases, not only translated English prompts. The practical standard is consistency: the same policy outcome should apply regardless of language, script, or phrasing. That becomes harder in environments where the model has broad tool access, because a single missed classification can turn a chat response into a data-exfiltration path.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A01Multilingual prompts can bypass agent safety checks and leak data through tool use.
CSA MAESTROGOV-2Governance must cover prompt risk, retrieval, and tool access across languages.
NIST AI RMFGOVERNAI governance should address inconsistent safety behavior across languages and contexts.

Test agent guardrails across languages and block risky actions at runtime, not just in English.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org