AI security is repeating the old rule-based failure pattern

By NHI Mgmt Group Editorial TeamPublished 2025-08-26Domain: Agentic AI & NHIsSource: Lakera

TL;DR: AI security teams are repeating the pre-AlexNet mistake by relying on static filters, manual test cases, and pattern matching against systems that evolve faster than handcrafted rules, according to Lakera. The real risk is that yesterday’s control model cannot generalise to dynamic AI behaviour, so governance must move from exception-chasing to adaptive assurance.

At a glance

What this is: This is Lakera’s argument that static, rule-based AI security is breaking down because AI systems and attacks change too quickly for handcrafted controls to keep up.

Why it matters: It matters because IAM, PAM, and NHI teams are already being asked to govern increasingly dynamic AI behaviour with control models built for stable identities, predictable requests, and fixed policy logic.

👉 Read Lakera's analysis of why AI security needs AI-native defences

Context

The core problem is not that AI security lacks tools. It is that many teams are still trying to secure dynamic AI behaviour with rules that assume stable patterns, bounded inputs, and fixed attack paths. That mismatch creates a governance gap across AI security, identity controls, and operational risk management.

For IAM and NHI practitioners, the lesson is broader than prompt filtering. When systems can change behaviour rapidly, security control design has to account for drift, variation, and repeated bypass attempts rather than one-off bad inputs. That puts pressure on policy design, monitoring, and review processes across human, machine, and emerging AI identity programmes.

Key questions

Q: How should security teams defend AI systems that change behaviour quickly?

A: Security teams should combine policy with adaptive detection, continuous testing, and behavioural monitoring. Static filters still have a role, but they should not be the only line of defence because AI misuse can be rephrased, re-scoped, and repeated in ways that bypass fixed rules. The goal is to validate behaviour over time, not just block known bad inputs.

Q: Why do static guardrails fail in AI security?

A: Static guardrails fail because they are built to recognise known patterns in a system that attackers can vary endlessly. When the same intent can be expressed through different prompts or workflows, a fixed rule set quickly becomes incomplete. Effective defence needs feedback loops and detection that adapts as the attack surface changes.

Q: What do organisations get wrong about AI security controls?

A: Organisations often assume that more rules automatically mean more security. In reality, adding exceptions can create a false sense of coverage while the underlying system continues to evolve. The better test is whether controls can handle new variations, not whether they can block the examples already documented.

Q: How should identity teams think about AI systems that can take actions?

A: Identity teams should treat action-capable AI as part of the governance boundary, not just as a content generator. Once a model can influence tools, retrieve data, or trigger workflows, access control, monitoring, and review become shared concerns across IAM, PAM, and NHI programmes. That requires a single operating model for runtime behaviour.

Technical breakdown

Why static guardrails fail against adaptive AI behaviour

Static guardrails work when the system being defended behaves predictably. In AI security, that assumption weakens because attackers can rephrase prompts, shift context, or alter input structure without changing intent. Rule-based filters tend to encode known bad patterns, but AI systems and their abuse patterns evolve faster than those rules can be updated. The result is a brittle defence model that catches obvious cases and misses variations. Security teams should think of this as a classification problem that keeps moving, not a fixed checklist of forbidden strings.

Practical implication: monitor for behavioural variation and policy drift, not just known bad prompts.

What the pre-AlexNet lesson means for security architecture

The article uses the shift from hand-coded computer vision to learned models as an analogy for security. The underlying point is that complex, open-ended domains do not scale well under handcrafted logic alone. In security, that means static policy engines, manual test cases, and one-time red-team findings can improve baseline protection, but they do not create durable coverage. Defence has to become more adaptive if the attack surface is inherently creative and fast-moving. That makes detection quality, feedback loops, and model-aware validation more important than simple rule expansion.

Practical implication: build adaptive validation and feedback into AI security controls instead of adding more brittle exceptions.

AI-native security versus policy-only security

AI-native security is the idea that the defence layer must learn patterns of misuse, not merely enforce hard-coded prohibitions. That does not mean abandoning policy, but it does mean policy is only one input into defence. In practice, organisations need controls that can recognise intent, context, and behavioural anomalies across changing prompts and workflows. This is especially relevant where AI systems are connected to tools, data, or downstream execution paths, because the security problem is no longer just content moderation. It becomes runtime governance of increasingly dynamic action paths.

Practical implication: treat AI security as runtime governance, especially when models can influence tools, data, or execution.

NHI Mgmt Group analysis

Static AI security rules are a temporary control, not a durable governance model. The article is right to frame prompt filters and manual test cases as useful but incomplete. They can absorb known patterns, but they do not hold up when inputs are reworded, recombined, or chained across sessions. The field should treat this as a boundary problem: controls that depend on memorising attacks will always lag attackers who can vary the expression of the same behaviour. Practitioners should assume that rule coverage will decay faster than teams can refresh it.

AI security is moving from content control to behaviour control. The important shift is not just what text appears at the input layer, but what the system does under changing conditions. That is a different governance problem for IAM and NHI teams, because runtime decisions, tool access, and downstream actions become part of the security boundary. Once AI can move beyond passive generation, security posture has to account for actionability, not just output quality. Practitioners should design for observed behaviour, not declared intent.

Pattern matching alone cannot defend open-ended systems at enterprise scale. The article’s analogy to computer vision and language is useful because it shows why exception-driven security never catches up in dynamic domains. As AI systems become more powerful, the variance in attack techniques expands with them. That creates a long-term governance burden: controls that only recognise previously seen abuse will underperform in exactly the cases that matter most. Practitioners should treat AI security as an adaptive discipline, not a library of static signatures.

The named concept here is static guardrail debt: every fixed rule added to AI security increases confidence faster than it increases real coverage. That debt accumulates when teams believe more exceptions equal more control, even though the system being defended can keep changing shape. The implication for the field is that security assurance must be measured against behavioural breadth, not the size of the rule set. Practitioners should review whether their programmes are buying visibility or merely buying comfort.

For identity teams, the takeaway is that AI governance will increasingly overlap with NHI and lifecycle controls. Once AI systems can influence tools, data, and operational decisions, they start to behave like governed identities even when they are not formal users. That pushes IAM, PAM, and NHI teams toward shared operating models for access, monitoring, and review. Practitioners should prepare for the point where AI security cannot be separated from identity governance.

From our research:
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
That confidence gap is a warning sign for AI governance programmes that now depend on the same access, lifecycle, and oversight assumptions, so practitioners should revisit NHI Lifecycle Management Guide before expanding runtime access.

What this signals

Static control sets will not keep pace with open-ended AI behaviour. Security teams should expect the centre of gravity to move from policy authoring toward behavioural assurance, especially where models can influence tools, data, or downstream execution. The more dynamic the system, the less useful one-time rule coverage becomes, and the more important continuous validation becomes. For practitioners, that means AI security has to be treated as a living control plane rather than a compliance checklist.

Identity governance will be pulled closer to AI security whether programme owners like it or not. When AI systems can take actions, access data, or shape workflows, access review, monitoring, and privilege boundaries stop being separate concerns. That makes lifecycle discipline more important, not less, because the programme must account for who or what can act at runtime. Teams should prepare for tighter coupling between IAM, NHI, and AI security ownership.

Adaptive defence will become the default expectation, not an advanced option. The organisations that keep relying on brittle filters will accumulate security debt as AI systems grow more capable and more exposed. The practical signal is simple: if your validation process only works for yesterday's attacks, it is already behind. Practitioners should invest in telemetry, retraining, and review loops that can absorb new abuse patterns without waiting for a control rewrite.

For practitioners

Replace static prompt rules with behavioural detection Measure whether your AI security controls can detect rephrased abuse, multi-step prompt variation, and context shifting instead of only matching known bad strings.
Review where AI systems can affect tools and data Map every model or agent path that can trigger tool use, retrieve sensitive data, or influence downstream actions, then classify those paths as governance boundaries.
Build feedback loops into security validation Use continuous red-team findings, false-positive review, and attack variation testing to keep security controls aligned with a changing AI attack surface.
Align AI security with identity governance Bring IAM, PAM, and NHI stakeholders into AI control design so access, monitoring, and review are handled as one operating model rather than separate teams.

Key takeaways

AI security fails when teams try to secure dynamic systems with fixed rules that only recognise known attack patterns.
The scale of the problem is structural, because rewording, chaining, and contextual variation let attackers bypass handcrafted guardrails repeatedly.
Practitioners should shift toward adaptive validation, behavioural monitoring, and shared identity governance for AI systems that can act.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		The article addresses dynamic AI behaviour and runtime misuse patterns.
NIST AI RMF		The piece argues for adaptive assurance over static rule sets.
NIST CSF 2.0	PR.AC-4	Access governance matters when AI systems can influence tools and data.

Assess whether AI security controls can detect behaviour changes, not just known bad prompts.

Key terms

Static Guardrail Debt: The growing gap between a fixed set of AI security rules and the changing ways those rules are bypassed. It appears when teams keep adding exceptions or filters instead of improving behavioural assurance, so confidence rises faster than actual coverage in dynamic systems.
Behavioural Assurance: A security approach that evaluates what an AI system actually does under changing inputs, contexts, and workflows. It goes beyond pattern matching by checking whether the system remains within acceptable bounds as prompts, tool calls, and execution paths vary.
Runtime Governance: The practice of controlling access, actions, and oversight while a system is operating, not just before deployment. For AI systems, this includes monitoring tool use, data access, and downstream effects so that security decisions reflect live behaviour rather than static assumptions.

What's in the full article

Lakera's full article covers the broader AI security argument and source commentary this post intentionally leaves at the strategic level:

The article's detailed analogy between rule-based computer vision and modern AI security
Mateo Rojas-Carulla's full argument on why static guardrails create a whack-a-mole cycle
The closing commentary on why AI security needs to become AI-native rather than rule-first

👉 Lakera's full article expands the argument with the original analogies and closing commentary

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or programme maturity, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-08-26.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org