Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Adaptive LLM defenses: what security teams need to change


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 9271
Topic starter  

TL;DR: Static LLM defenses fail as attackers adapt, and security must be measured against both attack resistance and user utility, according to Lakera’s Gandalf the Red analysis. The practical lesson is that LLM governance now needs adaptive controls, layered defenses, and narrower application scope rather than one fixed prompt-time safeguard.

NHIMG editorial — based on content published by Lakera: Gandalf the Red: Rethinking LLM Security with Adaptive Defenses

Questions worth separating out

Q: How should security teams evaluate LLM defenses in production?

A: They should evaluate both attack resistance and user utility.

Q: Why do static prompt defenses fail against AI attackers?

A: Static defenses fail because attackers learn from each refusal and refine their prompts accordingly.

Q: When does narrowing an LLM’s scope help security?

A: Narrowing scope helps when the application only needs a limited set of tasks and data domains.

Practitioner guidance

  • Define the security-utility boundary for each AI use case Set a clear threshold for how much response quality, task completion, or latency you will accept before a defense is considered too restrictive for production.
  • Test controls against adaptive attacker sessions Run evaluations that let the same adversary adapt over multiple prompts, because single-turn red teaming will miss the learning loop that breaks static defenses.
  • Layer prompt scope, session controls, and capability limits Use more than one safeguard so that a weakness in one layer does not leave the model exposed to the same attack path.

What's in the full report

Lakera's full research article covers the operational detail this post intentionally leaves for the source:

  • The D-SEC threat model and how it formalises attacker adaptation alongside utility loss.
  • The session-completion and attacker-failure metrics used to compare defence strategies.
  • The examples of prompt restriction and defence layering that show how different controls change usability.
  • The Gandalf gameplay findings that illustrate how iterative red teaming exposes weaknesses static tests miss.

👉 Read Lakera's research on adaptive defenses for LLM security and utility →

Adaptive LLM defenses: what security teams need to change?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 2 months ago
Posts: 8712
 

Adaptive LLM security is now a governance problem, not just a prompt-engineering problem. Static controls assume the threat is stable enough to be measured once and enforced indefinitely. That assumption fails when attackers learn from model feedback and change tactics mid-session. The implication is that governance teams must treat model security as an ongoing control loop, not a one-time configuration.

A few things that frame the scale:

  • 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
  • Another finding from the same research shows that only 52% of companies can track and audit the data their AI agents access, leaving 48% with a compliance and investigation blind spot.

A question worth separating out:

Q: How can teams tell whether an LLM defence is too strict?

A: A defence is too strict when it starts rejecting benign requests, shortening useful answers, or preventing core tasks from being completed. Those are signs that the model’s utility has been reduced past the point where the security gain is worth the operational cost.

👉 Read our full editorial: Adaptive defenses are now central to LLM security and utility



   
ReplyQuote
Share: