TL;DR: AI security teams are repeating the pre-AlexNet mistake by relying on static filters, manual test cases, and pattern matching against systems that evolve faster than handcrafted rules, according to Lakera. The real risk is that yesterday’s control model cannot generalise to dynamic AI behaviour, so governance must move from exception-chasing to adaptive assurance.
NHIMG editorial — based on content published by Lakera: What the AI Past Teaches Us About the Future of AI Security
Questions worth separating out
Q: How should security teams defend AI systems that change behaviour quickly?
A: Security teams should combine policy with adaptive detection, continuous testing, and behavioural monitoring.
Q: Why do static guardrails fail in AI security?
A: Static guardrails fail because they are built to recognise known patterns in a system that attackers can vary endlessly.
Q: What do organisations get wrong about AI security controls?
A: Organisations often assume that more rules automatically mean more security.
Practitioner guidance
- Replace static prompt rules with behavioural detection Measure whether your AI security controls can detect rephrased abuse, multi-step prompt variation, and context shifting instead of only matching known bad strings.
- Review where AI systems can affect tools and data Map every model or agent path that can trigger tool use, retrieve sensitive data, or influence downstream actions, then classify those paths as governance boundaries.
- Build feedback loops into security validation Use continuous red-team findings, false-positive review, and attack variation testing to keep security controls aligned with a changing AI attack surface.
What's in the full article
Lakera's full article covers the broader AI security argument and source commentary this post intentionally leaves at the strategic level:
- The article's detailed analogy between rule-based computer vision and modern AI security
- Mateo Rojas-Carulla's full argument on why static guardrails create a whack-a-mole cycle
- The closing commentary on why AI security needs to become AI-native rather than rule-first
👉 Read Lakera's analysis of why AI security needs AI-native defences →
AI security rule-based guardrails: what practitioners need to know?
Explore further
Static AI security rules are a temporary control, not a durable governance model. The article is right to frame prompt filters and manual test cases as useful but incomplete. They can absorb known patterns, but they do not hold up when inputs are reworded, recombined, or chained across sessions. The field should treat this as a boundary problem: controls that depend on memorising attacks will always lag attackers who can vary the expression of the same behaviour. Practitioners should assume that rule coverage will decay faster than teams can refresh it.
A few things that frame the scale:
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
- Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
A question worth separating out:
Q: How should identity teams think about AI systems that can take actions?
A: Identity teams should treat action-capable AI as part of the governance boundary, not just as a content generator. Once a model can influence tools, retrieve data, or trigger workflows, access control, monitoring, and review become shared concerns across IAM, PAM, and NHI programmes. That requires a single operating model for runtime behaviour.
👉 Read our full editorial: AI security is repeating the old rule-based failure pattern