AI agent evaluations versus attacks expose a governance gap

By NHI Mgmt Group Editorial TeamPublished 2025-12-04Domain: Agentic AI & NHIsSource: ZioSec

TL;DR: AI agent evaluations can show expected behaviour, but live attacks reveal how tools, prompts, and runtime access combine to create failure paths that tests miss, according to ZioSec. The real issue is that agent governance is still being treated like static software testing when it needs identity-aware attack validation.

At a glance

What this is: This is a ZioSec blog post arguing that AI agent evaluations are not a substitute for attacking agents, because runtime abuse paths reveal weaknesses that benchmark-style checks can miss.

Why it matters: It matters because IAM, PAM, and NHI programmes now need to assess agent identity, privilege, and tool use under adversarial conditions, not just inspect intended behaviour in isolation.

👉 Read ZioSec's analysis of AI agent evaluations versus attacks

Context

AI agent evaluations measure whether a model or workflow behaves as expected under controlled conditions. AI agent attacks test how that same system fails when an adversary manipulates prompts, tools, or runtime access paths, which is closer to how identity abuse unfolds in the real world. For practitioners, the key question is not whether the agent can pass a test, but whether its identity, permissions, and execution path can survive hostile input.

That distinction matters for NHI and emerging agentic AI governance because an agent is not just a model. Once the system can select tools and act with delegated access, its identity posture becomes part of the attack surface. Security teams should treat evaluation as evidence of intended behaviour, while attack simulation is evidence of control failure under pressure.

Key questions

Q: How should security teams test AI agents beyond standard evaluations?

A: Security teams should combine evaluations with adversarial attack testing that manipulates prompts, tool calls, and runtime context. The goal is to see whether the agent can be redirected into unsafe actions, not just whether it produces correct outputs. Tests should include delegated credentials, downstream access, and revocation behaviour so the result reflects real identity risk, not only model quality.

Q: Why do AI agent attacks reveal more risk than evaluations alone?

A: Because attacks simulate hostile conditions that evaluations usually exclude. An evaluation can show that an agent behaves well in a controlled benchmark, but it does not prove that the same agent will resist prompt injection, tool abuse, or context manipulation. Live attacks expose whether access, identity, and execution boundaries hold when the system is actively being steered.

Q: What do security teams get wrong about AI agent governance?

A: They often separate model testing from identity governance, even though the two failure modes are linked. An agent is only safe if its runtime access, tool reach, and delegated authority are constrained under attack. If those controls are not tested together, the programme may certify an agent that can still perform harmful actions once deployed.

Q: How do organisations know if agent access controls are actually working?

A: They know only when controls are tested against hostile behaviour, not when the agent merely passes a benchmark. Look for evidence that tool access is limited, sensitive actions require explicit boundaries, and revocation works quickly when behaviour changes. If attack testing can still drive unauthorised actions, the controls are not effective enough.

Technical breakdown

Why AI agent evaluations miss identity abuse paths

Evaluations usually check whether an agent follows instructions, avoids unsafe outputs, or stays within a benchmarked task boundary. They rarely model adversarial runtime conditions such as prompt injection, tool abuse, or delegated access misuse. That leaves a gap between approved behaviour and attackable behaviour. In identity terms, the evaluation proves that the agent can do the task, but not that its identity and privileges are safe when the environment becomes hostile.

Practical implication: teams need adversarial testing that includes identity and tool access paths, not just model-quality scores.

How attacks differ from tests in agentic systems

A live attack changes the context the agent sees, the data it trusts, or the tools it selects. That means the adversary is not evaluating output quality, but trying to redirect execution, expand scope, or induce unauthorised action. In agentic systems, runtime decisions matter as much as the prompt itself, because the agent may call external systems, retrieve secrets, or chain actions after the initial request. Attack testing exposes these control failures directly.

Practical implication: map every tool call, token, and downstream action to an accountable identity before exposing agents to production data.

Why agent security needs identity-aware validation

AI agents blur the line between application testing and identity governance because their behaviour depends on both model output and delegated access. Traditional security testing assumes stable application logic and fixed trust boundaries. Agentic systems can change their own execution path based on context, which means the meaningful security question is whether access is constrained at runtime, not merely whether the agent is accurate in a lab setting.

Practical implication: validate privilege boundaries, session scope, and revocation paths as part of every agent security review.

NHI Mgmt Group analysis

Evaluations prove intent, attacks prove exposure. AI agent evaluations are useful for measuring expected behaviour, but they do not prove that the agent is safe once adversaries can manipulate prompts, tools, or context. That distinction matters because identity security fails at the point where runtime trust is abused, not where a benchmark is passed. Practitioners should treat live attack testing as the control that exposes whether the agent’s access model is actually defensible.

Agent governance fails when identity and application testing are separated. Security programmes that test model quality in one lane and IAM in another miss the combined failure mode. An agent can appear well controlled in evaluation and still misuse delegated access, retrieve sensitive data, or trigger unauthorised actions under attack. The implication is that governance has to be evaluated as a single runtime system, not as two disconnected disciplines.

Runtime privilege is the named concept practitioners need to track. AI agents create runtime privilege because their effective authority depends on what they can do after deployment, not just what was approved at build time. That authority can change with context, tools, and delegated credentials. The practical conclusion is that security teams should review agent access as a live identity problem, not a static application checklist.

The control gap is not lack of testing, it is lack of hostile validation. Many programmes already run evaluations, but few run them against a realistic attack model that includes prompt manipulation, tool chaining, and data-access abuse. That leaves a false sense of confidence about governance readiness. Practitioners should assume that any agent not tested adversarially is still operating with unproven trust.

Agentic AI will force IAM and security engineering to converge. The more an agent can act, the less useful it is to separate safety evaluation from identity control. Access design, policy enforcement, and red-team testing will increasingly need to be reviewed together. Teams that keep those functions siloed will miss the most important failure mode: trusted automation that can be steered at runtime.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
For a broader view of identity failure modes, see 52 NHI Breaches Analysis, which tracks how exposed credentials become operational compromise.

What this signals

Runtime privilege: as AI agents gain more delegated access, the central governance problem shifts from output quality to access containment. Evaluations can confirm expected behaviour, but they do not prove that an agent will stay within its intended trust boundary when a prompt, tool, or data source is manipulated.

The practical signal for security teams is that agent identity must be reviewed alongside IAM and PAM, not after them. Once an agent can act with delegated authority, the programme should be measuring how quickly that authority can be constrained, revoked, and audited under attack conditions.

With the average estimated time to remediate a leaked secret at 27 days according to The State of Secrets in AppSec, delayed remediation is already a governance problem in machine identity programmes. Agent security will amplify that weakness unless teams test access paths as aggressively as they test model behaviour.

For practitioners

Run adversarial tests on agent identity paths Test how prompts, tool requests, and external data sources change agent behaviour under attack. Include the delegated credentials, session context, and downstream systems the agent can touch, not just model outputs.
Inventory every tool and credential an agent can reach Map each AI agent to the exact APIs, databases, and secrets it can access at runtime. Record who owns the entitlement, how it is approved, and what revocation looks like if the agent is compromised.
Treat evaluations as one control, not the control Use evaluations to measure intended behaviour, then pair them with live attack campaigns that probe for privilege misuse, prompt injection, and unintended action chaining. This closes the gap between design-time approval and runtime exposure.
Align AI governance with IAM and PAM reviews Include agent identities in access reviews, privilege escalation checks, and offboarding workflows. If the agent can act independently, its access lifecycle should be governed like any other high-risk non-human identity.

Key takeaways

AI agent evaluations are not a substitute for attack testing, because attacks expose how runtime identity and tool access fail under pressure.
The governance gap is structural: separating model validation from IAM review leaves agent privilege and execution paths insufficiently tested.
Practitioners should treat agent access as a live non-human identity problem and validate hostile behaviour before deployment.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agent attacks often exploit prompt and tool misuse in runtime.
NIST AI RMF		AI governance needs accountability for runtime agent behaviour.
NIST CSF 2.0	PR.AC-4	Delegated access and least privilege are central to agent identity risk.

Tie agent deployment to governance, measurement, and ongoing monitoring of intended behaviour.

Key terms

AI Agent Evaluation: A structured assessment of whether an AI agent behaves as intended under controlled conditions. In practice, it checks output quality, policy adherence, and task performance, but it does not by itself prove resilience against adversarial prompts, tool abuse, or runtime identity misuse.
Runtime Privilege: The effective authority an AI agent has while it is executing, including what tools, data, and actions it can reach in the moment. For agentic systems, runtime privilege can matter more than provisioning-time intent because context, delegation, and action chaining can expand or distort access.
Adversarial Agent Testing: Security testing that deliberately tries to steer an AI agent into unsafe, unauthorized, or out-of-scope actions. It differs from standard evaluation because it assumes a hostile environment and examines whether identity controls, tool boundaries, and revocation paths still hold under attack.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by ZioSec: AI Agents: Evaluations Versus Attacks. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-12-04.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org