Notifications

Clear all

Adaptive LLM defenses: what security teams need to change

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12387

Topic starter 05/07/2026 6:54 pm

TL;DR: Static LLM defenses fail as attackers adapt, and security must be measured against both attack resistance and user utility, according to Lakera’s Gandalf the Red analysis. The practical lesson is that LLM governance now needs adaptive controls, layered defenses, and narrower application scope rather than one fixed prompt-time safeguard.

NHIMG editorial — based on content published by Lakera: Gandalf the Red: Rethinking LLM Security with Adaptive Defenses

Questions worth separating out

Q: How should security teams evaluate LLM defenses in production?

A: They should evaluate both attack resistance and user utility.

Q: Why do static prompt defenses fail against AI attackers?

A: Static defenses fail because attackers learn from each refusal and refine their prompts accordingly.

Q: When does narrowing an LLM’s scope help security?

A: Narrowing scope helps when the application only needs a limited set of tasks and data domains.

Practitioner guidance

Define the security-utility boundary for each AI use case Set a clear threshold for how much response quality, task completion, or latency you will accept before a defense is considered too restrictive for production.
Test controls against adaptive attacker sessions Run evaluations that let the same adversary adapt over multiple prompts, because single-turn red teaming will miss the learning loop that breaks static defenses.
Layer prompt scope, session controls, and capability limits Use more than one safeguard so that a weakness in one layer does not leave the model exposed to the same attack path.

What's in the full report

Lakera's full research article covers the operational detail this post intentionally leaves for the source:

The D-SEC threat model and how it formalises attacker adaptation alongside utility loss.
The session-completion and attacker-failure metrics used to compare defence strategies.
The examples of prompt restriction and defence layering that show how different controls change usability.
The Gandalf gameplay findings that illustrate how iterative red teaming exposes weaknesses static tests miss.

👉 Read Lakera's research on adaptive defenses for LLM security and utility →

Adaptive LLM defenses: what security teams need to change?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 3 months ago

Posts: 11961

05/07/2026 7:15 pm

Adaptive LLM security is now a governance problem, not just a prompt-engineering problem. Static controls assume the threat is stable enough to be measured once and enforced indefinitely. That assumption fails when attackers learn from model feedback and change tactics mid-session. The implication is that governance teams must treat model security as an ongoing control loop, not a one-time configuration.

A few things that frame the scale:

80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Another finding from the same research shows that only 52% of companies can track and audit the data their AI agents access, leaving 48% with a compliance and investigation blind spot.

A question worth separating out:

Q: How can teams tell whether an LLM defence is too strict?

A: A defence is too strict when it starts rejecting benign requests, shortening useful answers, or preventing core tasks from being completed. Those are signs that the model’s utility has been reduced past the point where the security gain is worth the operational cost.

👉 Read our full editorial: Adaptive defenses are now central to LLM security and utility

ReplyQuote

Forum Statistics

11 Forums

13.6 K Topics

26.1 K Posts

47 Online

135 Members

Latest Post: LLM security and AI-driven crime: what security teams must change Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies