Agentic AI security is converging on the wrong control model

By NHI Mgmt Group Editorial TeamPublished 2026-05-01Domain: Agentic AI & NHIsSource: Arkose Labs

TL;DR: Agentic AI attackers can learn trust boundaries through autonomous iteration, session-to-session learning, and identity spoofing, while the Arkose Labs 2026 Agentic AI Security Report says 97% of enterprise leaders expect a material incident within 12 months but only 6% of security budgets target it. Identity verification alone is fragile when the attacker’s behaviour evolves faster than review cycles can respond.

At a glance

What this is: This is an analysis of why identity-first controls are not enough for agentic AI security, with a key finding that behaviour and economic pressure matter more than classification alone.

Why it matters: It matters because IAM and security teams need controls that govern what agents do at runtime, not just who they claim to be, across autonomous, NHI, and human-facing workflows.

By the numbers:

97% of enterprise leaders expect a material AI-agent-driven incident within 12 months, yet only 6% of security budgets are dedicated to tackling it.
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.

👉 Read Arkose Labs' analysis of agentic AI security and interaction-layer controls

Context

Agentic AI security fails when teams assume identity verification is the same thing as behavioural control. The article argues that trust layers can classify an agent, but they do not reliably show what the agent will do once it is inside a session, especially when iteration and learning happen at machine speed.

For IAM practitioners, the core issue is that access governance built for static or human-paced decisions does not hold when the actor probes, adapts, and returns repeatedly. That makes agent identity, interaction-layer telemetry, and policy enforcement part of the same problem space rather than separate controls.

Arkose Labs frames the issue through agentic AI fraud and interaction-layer economics, which is an atypical but increasingly relevant lens for teams that have treated bot management, IAM, and AI governance as separate disciplines.

Key questions

Q: How should security teams govern agentic AI when identity checks are not enough?

A: They should treat identity as a starting point, then enforce behaviour controls at the interaction layer. The key is to watch what the agent actually does during sessions, not only what credentials it presents. If the platform can classify an agent but cannot constrain or measure its actions, governance remains incomplete and attackers can learn the boundary.

Q: Why do agentic AI systems break traditional trust-based access models?

A: They break them because the attacker can probe the model repeatedly, learn the decision boundary, and adapt faster than human review cycles can respond. Traditional trust models assume a relatively stable actor, but agentic attackers can iterate until they look legitimate. That makes static verification too weak as a standalone control.

Q: How do teams know if agentic AI controls are actually working?

A: They should look for reduced successful probing, higher attacker cost per session, and better visibility into what agents do across critical workflows. A control that only improves detection counts is not enough if it still allows learned abuse to continue. Effective controls make misuse harder, slower, and less profitable.

Q: What should organisations do when their agent identity model cannot explain behaviour?

A: They should stop treating the model as a complete answer and add policy at the point of action. That means separating legitimate automation from suspicious automation, defining allowed behaviour by endpoint, and making sure the control plane can challenge or throttle sessions when behaviour drifts. Governance must cover runtime action, not just identity claims.

Technical breakdown

Agent identity does not prove agent behaviour

Identity frameworks can tell you which agent presented credentials, what metadata it carried, and whether it matched an allowlist. They cannot prove intent, and they cannot reliably predict how an agent will change under repeated probing. In agentic systems, the attack surface is not just authentication. It is the interaction sequence itself, where an attacker can learn the decision boundary by varying timing, inputs, and session patterns until the system treats hostile automation as legitimate traffic.

Practical implication: build controls that evaluate runtime behaviour at the interaction layer, not just identity claims at login.

Economic deterrence changes the attack equation

Economic deterrence shifts security from perfect classification to cost shaping. If each failed probe forces the attacker to spend more compute, more human labor, more credential capital, or more session friction, the attack campaign becomes less profitable even when detection is imperfect. This matters because every classification system has an error rate, but an attack that becomes uneconomical can stop without requiring flawless verification. That is the architectural difference between trust and resistance.

Practical implication: design response states that increase attacker cost on suspicious agent traffic instead of relying only on block or allow decisions.

Interaction-layer signals are the missing control plane

Network-level telemetry often shows that something is automated, but not whether it is acting with legitimate intent. Interaction-layer controls generate signals from solve timing, response consistency, failure patterns, and pressure response, which is where agentic fraud and misuse actually reveal themselves. In practice, this becomes the only layer that can connect agent behaviour to policy decisions across account creation, login, checkout, and API abuse paths.

Practical implication: instrument the business interactions where agents create value and where they can also hide abuse.

Threat narrative

Attacker objective: The attacker aims to turn a trust-based agent security model into a predictable process they can learn, reuse, and scale for fraud or unauthorized action.

Entry begins when an agentic attacker presents sessions that look legitimate enough to pass initial verification and start learning the platform’s trust boundary.
Credential or behavioural access is then refined through autonomous probing, where the attacker varies timing, patterns, and identity signals to discover what the system will accept.
Impact follows once the decision boundary is understood, because the attacker can repeatedly reuse that learned behaviour to sustain fraud, spoof identity, and operate at machine speed.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Agent identity frameworks are necessary, but they fail as a complete security model when behaviour is the real target. Identity verification can tell you what the agent claims to be, but it does not govern what the agent actually does once a session begins. That distinction is central to OWASP-AGENTIC and OWASP-NHI thinking, because access control without behavioural enforcement becomes a classification exercise instead of a security boundary. Practitioners should treat identity as an input to policy, not the policy itself.

Interaction-layer economics is the more durable control pattern for agentic AI because it penalises attack learning instead of assuming perfect detection. The article’s core point is that attackers can probe a trust boundary until it becomes legible, so the defence has to change the economics of that probing. This is where NIST-CSF and ZT-NIST-207 matter, because continuous verification must be paired with response states that raise the cost of misuse. The practitioner conclusion is straightforward: if an attacker can learn your model cheaply, your model is not a control.

Identity blast radius is the right named concept for this category: the problem is not just who the agent is, but how far a single successful session can propagate once behaviour is trusted. That blast radius expands when classification, authorisation, and interaction telemetry are separated across teams or tools. The field should stop treating agent identity as a discrete IAM project and start treating it as a runtime governance problem. Practitioners should evaluate whether their controls constrain behaviour at the point of action.

This article shows that agentic AI security is converging on the wrong default assumption: that verification precedes control. In practice, agentic attackers use the verification process itself as a learning surface. That breaks the premise behind many access models, especially where security teams assume they can validate trust once and reuse the result across a session or workflow. The implication is that governance must shift from static trust decisions to monitored, enforceable behaviour boundaries.

Agentic AI should be governed as a non-human identity with autonomous characteristics only when runtime decision-making is actually independent. The article is about agent behaviour, not just tool use, which means the governance model must account for autonomous iteration, but only where the three autonomy conditions are present. For practitioners, that means separating constrained automation from genuinely independent actors before choosing OWASP-NHI, OWASP-AGENTIC, or broader zero trust controls.

From our research:
97% of enterprise leaders expect a material AI-agent-driven incident within 12 months, yet only 6% of security budgets are dedicated to tackling it, according to AI Agents: The New Attack Surface report.
Only 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems and revealing access credentials, according to AI Agents: The New Attack Surface report.
For a broader breach-oriented lens on how identity failures compound, see The 52 NHI Breaches Report, which maps recurring control gaps across real incidents.

What this signals

Identity-first agent security will not hold if teams cannot observe behaviour where it happens. The market is moving toward runtime governance, and that means security programmes need stronger ownership of session-level signals, interaction telemetry, and response policy. With 80% of organisations already reporting agents acting outside intended scope, per AI Agents: The New Attack Surface report, the question is no longer whether the problem exists.

Identity blast radius: a single agent session can create outsized exposure when classification and action control are disconnected. This is why the category is converging on runtime enforcement rather than trust labels alone, and why programmes should align with OWASP Agentic AI Top 10 and NIST AI Risk Management Framework where agent behaviour is in scope.

Security and fraud teams should prepare for AI agent governance to look less like traditional IAM and more like continuous decision control. That means endpoint policy, behavioural telemetry, and business-transaction monitoring need to be operationally linked, not managed as separate projects.

For practitioners

Separate identity verification from behaviour enforcement Map where your current controls only classify agents and where they actually constrain runtime actions. Put interaction-layer checks in the paths where agents create accounts, authenticate, transact, or call APIs.
Measure attacker learning cost, not only detection rate Test whether your controls make repeated probing more expensive over time by increasing friction, requiring stronger proof, or limiting reuse of learned paths.
Create a three-tier agent policy model Distinguish good agents, bad agents, and gray-area agents so that security and fraud teams can set policy by behaviour, risk, and endpoint instead of treating all automation the same.
Place governance ownership with the teams that see the session Give security and fraud teams authority to define acceptable agent behaviour by endpoint, geography, and risk score so policy decisions do not wait on engineering changes.

Key takeaways

Agentic AI security fails when teams assume identity verification is enough to govern runtime behaviour.
The evidence shows a large share of organisations already see AI agents exceed scope, while budgets remain too small to absorb the risk.
Practitioners should move from trust labels to interaction-layer control, because that is where agentic abuse becomes visible and expensive.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	The article focuses on agent identity, behaviour, and misuse of trust boundaries.
OWASP Non-Human Identity Top 10	NHI-01	Agent identities are non-human identities that need lifecycle and usage governance.
NIST CSF 2.0	PR.AC-4	The post argues that access must be continuously enforced, not merely granted.

Tie agent permissions to monitored access decisions and review them against actual behaviour.

Key terms

Agent identity: A machine or software identity used by an AI agent to authenticate to tools, APIs, or platforms. In practice, the identity is only one part of governance because a verified agent can still behave unexpectedly once a session begins, especially when it can iterate, learn, or chain actions at runtime.
Interaction layer: The point where a user, agent, or automation interacts with the business flow, such as login, checkout, account creation, or API use. This layer matters because it exposes behaviour, not just network characteristics, and it is often where agentic misuse becomes visible before deeper compromise occurs.
Economic deterrence: A control strategy that raises the cost of attack attempts until abuse is no longer worthwhile. In agentic security, this means increasing friction, compute cost, or operational overhead for suspicious sessions so that repeated probing becomes unprofitable even when classification is imperfect.
Identity blast radius: The amount of damage a single trusted identity can cause before controls interrupt it. For agentic systems, the blast radius can expand quickly because an apparently legitimate session may trigger repeated actions, chained decisions, and high-volume abuse before humans can review the behaviour.

Deepen your knowledge

Agentic AI identity, runtime behaviour, and interaction-layer governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for autonomous or semi-autonomous agents, it is worth exploring.

This post draws on content published by Arkose Labs: AI The Agentic AI Security Category Is Converging on the Wrong Answer. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-01.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org