TL;DR: AI systems shift behavior with context, prompts, model updates, and tool use, so static tests miss the failures that emerge only during interaction, according to Lakera. Security now has to govern behavior across the full AI application lifecycle, not just model outputs.
NHIMG editorial — based on content published by Lakera: AI red teaming and the art of stress-testing non-deterministic systems
Questions worth separating out
Q: How should security teams red team non-deterministic AI systems?
A: They should test AI systems continuously across design, pre-release, and post-deployment phases, because behaviour can drift after model updates or tool changes.
Q: Why do static tests fail for AI red teaming?
A: Static tests fail because AI behaviour is shaped by context, phrasing, and evolving model states, so a one-time benchmark cannot capture emergent failures.
Q: What breaks when AI systems can call tools autonomously?
A: The boundary between bad output and bad action breaks down.
Practitioner guidance
- Shift red teaming to continuous evaluation Run adversarial tests during design, pre-release regression, and post-deployment drift monitoring so the control follows the system as it changes.
- Map the full AI application attack surface Document the foundation model, system prompt, retrieval sources, external APIs, and action endpoints together so testing covers the full path from input to impact.
- Test for semantic bypass, not just bad strings Create scenarios where benign-looking language carries malicious instruction, then verify how the system handles the hidden intent across downstream actions.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- The article expands on the design, regression, and drift-monitoring phases of continuous AI red teaming.
- It explains how Lakera frames prompt attacks as application-specific rather than generic jailbreak tests.
- It shows why tool-calling logic and system prompts must be tested together with model outputs.
- It closes with the practical case for automated adversarial evaluation across the AI lifecycle.
👉 Read Lakera's analysis of AI red teaming for non-deterministic systems →
AI red teaming and non-deterministic systems: are your controls keeping up?
Explore further
AI red teaming is now a governance discipline, not a security side exercise. The article shows that point-in-time validation fails when model behaviour shifts with phrasing, context, and updates. That means the control objective is no longer simple detection of bad prompts, but ongoing assurance that the application still behaves within its intended boundaries. Practitioners should treat red teaming as a permanent operating model, not a release checkpoint.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- 43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases, which shows that AI risk is already being treated as an identity and data governance issue.
A question worth separating out:
Q: How do organisations know if AI red teaming is working?
A: They know it is working when testing finds new failure modes before production does, and when the same controls are revalidated after model updates or tool additions. A mature programme produces repeatable evidence that the system still behaves within the intended boundary as it changes.
👉 Read our full editorial: AI red teaming shows why deterministic security no longer fits