Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Multilingual LLM attacks: are your controls keeping up?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 9236
Topic starter  

TL;DR: English-first guardrails leave LLMs exposed to prompt attacks, data extraction, and translation-based bypasses across many languages, according to Lakera’s analysis of real-world cases and research. Security teams need multilingual defenses, not just broader model coverage, because policy enforcement breaks when language handling is inconsistent.

NHIMG editorial — based on content published by Lakera: Language Is All You Need, the hidden AI security risk

By the numbers:

  • Attackers have successfully bypassed Gandalf’s guardrails in over 85 languages using techniques like code-switching, translation-based exploits, and multilingual data extraction.

Questions worth separating out

Q: How should security teams test LLM guardrails across multiple languages?

A: Security teams should test the same harmful intent in every major language and in code-switched variants, then compare block, warn, and allow outcomes.

Q: Why do multilingual prompts increase the risk of AI data leakage?

A: Multilingual prompts increase leakage risk because safety controls often recognise restricted requests more reliably in English than in translated or transliterated forms.

Q: What do teams get wrong about multilingual AI security?

A: Teams often assume that a model that understands many languages is automatically secure in those languages.

Practitioner guidance

  • Test safety controls by language family Run jailbreak, prompt-injection, and data-extraction tests in the languages your users and attackers are most likely to use, then compare allow and block outcomes for the same intent.
  • Evaluate translation and code-switching paths Check whether the model can be induced to ignore or weaken policy when a request is split across languages, transliterated, or translated before moderation.
  • Add multilingual abuse cases to release gates Require pre-production approval to include non-English prompt sets, low-resource language tests, and output review for sensitive-data leakage before deployment.

What's in the full article

Lakera's full article covers the operational detail this post intentionally leaves for the source:

  • Examples of multilingual prompt attacks on Gandalf and why they bypass English-centric guardrails.
  • The article's comparison of code-switching, translation exploits, and multilingual extraction techniques.
  • The checklist for designing multilingual AI security controls across inputs, outputs, and monitoring.
  • The source's explanation of why low-resource languages create weaker safety coverage.

👉 Read Lakera's analysis of multilingual AI security risks and bypass techniques →

Multilingual LLM attacks: are your controls keeping up?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 2 months ago
Posts: 8675
 

English-first AI security is a governance assumption, not a technical constant. The article shows that many teams have treated multilingual support as a deployment detail instead of a control requirement. That assumption fails once the same harmful intent can be expressed in multiple languages, because enforcement no longer maps cleanly to policy intent. The practical conclusion is that multilingual enforcement has to be designed as part of the control model, not added after launch.

A few things that frame the scale:

  • Attackers have successfully bypassed Gandalf’s guardrails in over 85 languages using techniques like code-switching, translation-based exploits, and multilingual data extraction, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.
  • 43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases.

A question worth separating out:

Q: How can organisations tell whether multilingual safety controls are actually working?

A: They should measure whether the same policy decision holds across translations, transliterations, and mixed-language prompts. A working control produces consistent outcomes for equivalent intent, not just consistent performance in one language. Any gap between languages is a governance weakness, especially when the AI can access sensitive data or trigger downstream actions.

👉 Read our full editorial: Multilingual AI security gaps expose English-first LLM defenses



   
ReplyQuote
Share: