AI agent security testing reveals the limits of traditional controls

By NHI Mgmt Group Editorial TeamPublished 2025-12-08Domain: Agentic AI & NHIsSource: ZioSec

TL;DR: AI agents create a security problem that traditional tooling cannot fully cover because the attack surface shifts into prompt manipulation, data poisoning, tool misuse, and weak observability, according to ZioSec, with 28 million AI-driven cyberattacks projected in 2025 and 87% of organizations targeted in the last year. Autonomous agent governance now depends on testing reasoning paths, tool permissions, and logging, not just code and infrastructure.

At a glance

What this is: This is a practitioner-focused analysis of why proactive red teaming is becoming necessary for AI agents, with the key finding that their autonomy expands the attack surface beyond conventional software controls.

Why it matters: It matters because IAM, PAM, and security architects have to govern agent behaviour as well as access, or existing controls will miss the actual failure points in autonomous workflows.

By the numbers:

As global AI-driven cyberattacks are projected to surpass 28 million incidents in 2025, the urgency to address these unique risks is undeniable.
In the last year, 87% of organizations have been targeted by an AI cyberattack.
Only 3% of organizations had proper AI access control systems in 2025.

👉 Read ZioSec's analysis of proactive security testing for AI agents

Context

AI agent security testing is the practice of actively probing an agent’s reasoning, tool use, and data handling before an attacker does. The central problem is that autonomous agents can make decisions, call tools, and act on instructions that were never written as deterministic code paths, which makes the security boundary much harder to define than in traditional software.

For identity and access teams, the issue is not just model safety. AI agents behave like non-human identities with delegated authority, so governance has to cover tool permissions, data access, logging, and lifecycle control at the same time. That is why red teaming for agents is increasingly an identity governance problem, not only an application testing exercise.

The article’s starting position is typical of current market thinking: builders are being pushed to secure agentic systems after deployment pressure has already created real exposure. That makes the analysis relevant to any programme that still treats agent access as a standard automation problem.

Key questions

Q: How should security teams test AI agents for prompt injection risk?

A: Use adversarial inputs that place conflicting instructions in prompts, retrieved documents, and memory. Then check whether the agent still follows system intent, refuses unsafe actions, and preserves tool boundaries. The goal is not just to block obvious attacks, but to prove the agent can resist context corruption during normal operations.

Q: Why do AI agents complicate least-privilege design?

A: AI agents complicate least privilege because they can choose tools and sequence actions at runtime, often in ways the original designer did not anticipate. A permission set that looks narrow on paper can still produce harmful outcomes if the agent can combine approved actions into an unintended workflow.

Q: What do organisations get wrong about AI agent logging?

A: They log API activity but fail to capture the full decision trace. Without prompts, retrieved context, tool selection, and outputs in one record, teams cannot explain why the agent acted or whether it was manipulated. That leaves incident response and audit work with incomplete evidence.

Q: Who should own AI agent governance in an enterprise?

A: Ownership should sit across security, IAM, application, and platform teams, because agent governance spans entitlement design, runtime behaviour, data access, and lifecycle control. If ownership is left only to developers, the programme usually misses offboarding, monitoring, and access review discipline.

Technical breakdown

Prompt injection as a control-plane bypass

Prompt injection is the practice of embedding malicious instructions in user input, retrieved content, or agent memory so the model follows attacker intent instead of system intent. For agents, the risk is not only unsafe text generation. The model may also route data, select tools, or chain actions based on corrupted instructions. This makes the prompt layer a control plane, not just an interface. Once the agent trusts injected context, downstream safeguards can be bypassed because the agent is acting within its own authorised session.

Practical implication: separate trusted system instructions from untrusted content and test whether the agent can be redirected by tainted inputs.

Tool misuse and least-privilege failure

AI agents become materially riskier when they can call email, database, cloud, or ticketing tools under broad permissions. The failure mode is not that a tool exists, but that the agent can use it in a sequence the designer did not anticipate. That means least privilege must be evaluated at the tool-action level, not only at the account level. If a compromise or misalignment occurs, the agent may still stay within approved authentication boundaries while performing harmful actions with legitimate credentials.

Practical implication: scope every tool to the smallest possible action set and test whether a single agent session can exceed intended business authority.

Logging, monitoring, and auditability gaps

Agents that plan, remember, and act independently create forensics problems if the organisation cannot reconstruct what they saw, decided, and executed. Traditional logs often capture API calls but miss the reasoning path, hidden context, retrieval inputs, and intermediate tool selections that explain agent behaviour. That leaves defenders with a partial view and weak incident reconstruction. In practice, the absence of agent-level telemetry turns suspicious activity into a guess rather than an investigation.

Practical implication: log prompts, retrieved context, tool calls, and outputs as a single trace so behaviour can be reviewed and challenged later.

Threat narrative

Attacker objective: The attacker wants to turn the agent’s own authority into a path for data theft, unsafe actions, or operational disruption without needing a classic code exploit.

Entry occurs when an attacker injects malicious instructions through prompt content, poisoned retrieval material, or another untrusted input channel.
Escalation follows when the agent accepts the malicious context and misuses authorised tools or data sources under legitimate credentials.
Impact occurs when the agent leaks sensitive data, executes harmful actions, or corrupts downstream workflows while remaining inside its nominal access envelope.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI agents turn access control into behaviour control. Traditional IAM assumes the subject will request, receive, and use access in predictable ways. That assumption fails when the actor can choose actions at runtime, combine tools dynamically, and continue operating without human approval. The implication is that governance can no longer stop at entitlement design, because runtime behaviour becomes the real security boundary.

Agentic input is now part of the trust boundary. Prompt content, retrieved documents, and memory are not just data sources, they are potential instruction channels. That changes the governance model from protecting a static application input to defending a live decision environment. Practitioners should treat the agent’s context as a sensitive control surface, not a neutral buffer.

Least privilege is harder to prove when the system can improvise. The article’s core lesson is not that agents need more controls, but that many controls were designed for fixed workflows and stable intent. When a system can select tools, sequence actions, and adapt in-session, the old idea of predefining sufficient privilege becomes fragile. Practitioners should reassess whether their access model can still express the actual operational boundary.

Break Your Own AI Agent is a testing mindset, not a product category. The strongest signal in the article is that proactive adversarial testing has to move into the build lifecycle before users depend on agent behaviour. That aligns with OWASP NHI and agentic AI risk thinking, where identity, privilege, and misuse are assessed together rather than in separate silos. Practitioners should treat agent red teaming as a governance requirement, not an optional hardening step.

Shadow AI expands the identity problem faster than security teams can enumerate it. Once teams accept that unmanaged agents can appear outside approved tooling, the programme has to cover discovery, ownership, and offboarding as well as runtime control. This is a lifecycle failure as much as a security one. Practitioners should assume unknown agent identities exist until proven otherwise.

From our research:
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a wider control lens, see OWASP Agentic AI Top 10 for the runtime risks that red teams should be testing against.

What this signals

Agent discovery will become a governance priority before model choice does. Once organisations realise that unmanaged agents can operate with meaningful authority, the first gap is not policy sophistication but visibility. That is why discovery, ownership, and lifecycle control now matter as much as prompt hardening, especially when only 52% of companies can track and audit what their AI agents access.

Agent red teaming needs to be wired into IAM, not bolted onto AppSec. The control failures in this category are about delegated authority, runtime access, and abuse of legitimate tools, so the response cannot live entirely in model testing. Security teams should align their agent assurance work with the OWASP Top 10 for Agentic Applications 2026 and treat access traces as first-class evidence.

Shadow AI is the programme-level risk multiplier. Once 98% of companies plan to deploy even more AI agents within the next 12 months, unmanaged instances will grow faster than review cycles. Practitioners should expect the identity perimeter to expand into places where traditional asset inventories, access reviews, and recertification processes were never built to look.

For practitioners

Test prompt and retrieval boundaries Run adversarial cases that mix benign and malicious instructions in user prompts, RAG sources, and memory inputs. Measure whether the agent preserves system intent when context is contaminated.
Constrain tool permissions to business-safe actions Map every agent tool to a specific action class, then remove anything that would let one session read, write, or trigger actions outside its stated use case.
Build agent-level telemetry before rollout Capture prompts, retrieved context, tool calls, and outputs in one trace so investigators can reconstruct the decision path after an incident.
Red-team the agent lifecycle, not only the model Review discovery, ownership, approval, and offboarding for every deployed agent so unmanaged instances do not become shadow AI with persistent access.

Key takeaways

AI agents expand the identity problem from access assignment to runtime behaviour control.
Industry data shows a wide governance gap, with strong concern but limited policy adoption and auditability.
Security teams should test agent instruction handling, tool scope, telemetry, and lifecycle ownership before scale increases exposure.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Addresses prompt injection, tool misuse, and agent behaviour risks described in the article.
OWASP Non-Human Identity Top 10	NHI-03	Covers non-human identity access and privilege patterns for agents using real tools and credentials.
NIST AI RMF		AI governance and accountability apply where autonomous agent behaviour affects security outcomes.

Test agent workflows against instruction hijack, tool abuse, and unsafe autonomous actions before rollout.

Key terms

AI Agent: A software entity that can choose actions, select tools, and time execution without needing a human to approve each step. In identity terms, it behaves like a non-human identity with runtime discretion, so governance must cover both entitlement scope and the behaviour that happens after access is granted.
Prompt Injection: A manipulation technique where attacker-controlled text steers an AI agent away from its intended instructions. For autonomous systems, the danger is not only bad output. The injected instruction can also redirect tools, expose data, or alter the agent's next action while staying inside legitimate session context.
Tool Misuse: The harmful use of an authorised tool by an AI agent, even when the underlying credentials are valid. This matters because the security failure is often not authentication failure, but the agent using approved access for an unapproved purpose, sequence, or destination.
Shadow AI: An AI agent or automated AI workflow operating outside approved inventory, governance, or security oversight. In practice, shadow AI creates blind spots in ownership, offboarding, logging, and access review, which means the organisation may not know which identities exist or what they can reach.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by ZioSec: Break Your Own AI Agent, Part 1. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-12-08.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org