Agentic AI red teaming exposes gaps in identity governance

By NHI Mgmt Group Editorial TeamPublished 2025-08-20Domain: Agentic AI & NHIsSource: TROJ.AI

TL;DR: Agentic AI red teaming must move from single-turn prompts to multi-turn, stateful testing because agents can chain tools, retain memory, and trigger production actions, according to TROJ.AI. Existing IAM assumptions break when delegated authority, tool misuse, and memory manipulation can cascade across an entire workflow.

At a glance

What this is: This is a TROJ.AI analysis of agentic AI red teaming that argues agent workflows create a broader security surface than chatbots, especially where tool use, memory, and delegated access intersect.

Why it matters: It matters because IAM, PAM, and NHI teams now have to govern actions that are initiated by software actors, not just humans, and the controls must account for stateful execution, not only authentication events.

By the numbers:

80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%).
17 minutes.

👉 Read TROJ.AI's analysis of agentic AI red teaming and runtime risk

Context

Agentic AI is software that can decide what to do next, choose tools, and carry out multi-step work with limited or no human intervention. The governance gap is that most identity controls were built for request-response access, not for actors that can reason, chain actions, and reuse state across a session. That shift changes how access, accountability, and monitoring have to work.

For IAM and NHI programmes, the issue is not just whether an agent is authenticated. It is whether its delegated authority is narrow enough, temporary enough, and observable enough to survive a workflow that can change direction mid-execution. Red teaming becomes a governance test as much as a technical test, because the failure modes now include tool misuse, memory poisoning, and planning manipulation.

The article’s starting position is typical for the market right now: agentic capabilities are advancing faster than the security model that is supposed to contain them.

Key questions

Q: How should security teams red team agentic AI workflows?

A: Security teams should test complete agent journeys, not isolated prompts. That means simulating tool calls, memory reuse, chained decisions, and recovery after a benign first step. The goal is to see whether the workflow can be redirected once trust is already established, because that is where agentic abuse usually becomes visible.

Q: Why do agentic systems create different identity risks from chatbots?

A: Agentic systems can act, sequence tasks, and reuse state. That means identity risk is not limited to authentication or prompt quality, because the software can carry delegated authority into later steps. The result is a broader attack surface where tool misuse, scope drift, and control bypass become operational concerns.

Q: What breaks when AI agents are given broad delegated access?

A: Broad delegated access breaks the assumption that the actor’s privilege remains narrow and predictable throughout execution. If the agent can choose tools, chain actions, and continue across multiple steps, it may reach systems or data that were never intended for the original task. That creates a governance failure, not just a technical one.

Q: How can teams measure whether agent governance is working?

A: Teams should measure whether they can observe, constrain, and revoke agent actions at the same granularity as the task itself. Useful signals include whether tool use is logged, whether access is time-bounded, and whether the agent can be prevented from reusing stale context or expanding scope mid-session.

Technical breakdown

Multi-turn red teaming for agentic workflows

Traditional AI testing often looks at one prompt and one response. Agentic systems require a different model because a benign first step can set up a later failure after tools are called, memory is reused, or a follow-on action is triggered. Multi-turn red teaming simulates realistic task sequences so defenders can see how an agent behaves across several decisions, not just at the first interaction. That matters because the attack surface is cumulative, and the dangerous state may only appear after the workflow has already been partially trusted.

Practical implication: test complete task chains, not just isolated prompts, before granting production access.

Tool poisoning, puppet attacks, and rug pulls

The article identifies three recurring agentic attack patterns. Tool poisoning hides malicious code inside a legitimate-looking tool, puppet attacks use a benign front end with a hostile backend, and rug pulls let a tool build trust before switching behavior later. These are identity problems as much as software problems because the agent is deciding what to invoke based on trust signals that can be manipulated. Once an agent treats a tool as reliable, the attacker gains a path to shape execution without needing to break authentication first.

Practical implication: validate tool provenance and behaviour continuously, not just at onboarding.

Fine-grained delegation and Zero Trust for agents

The article argues that agentic AI needs fine-grained delegation limited to a specific task and duration. That is essentially a Zero Trust problem for non-human actors: every privilege should be narrow, explicit, and scoped to the minimum necessary action set. A resume-screening agent should not inherit the wider rights of the HR system, and a task-specific assistant should not retain broad access after the workflow ends. This is where identity governance becomes operational, because standing privilege and broad trust relationships become unsafe defaults.

Practical implication: align delegation to task scope and duration, then verify the agent cannot expand its own access.

Threat narrative

Attacker objective: The attacker wants to steer an agentic workflow into doing work on their behalf while preserving the appearance of legitimate execution.

Entry begins when an attacker targets the agent’s tool path, memory layer, or delegated workflow rather than the model alone, often by poisoning a trusted input channel or surrounding service.
Escalation occurs when the agent reuses tainted context, selects a compromised tool, or chains follow-on actions that expand the attacker’s influence across several steps of the workflow.
Impact arrives when the agent completes unauthorised actions in production, such as data exposure, credential leakage, or workflow redirection that changes business outcomes without immediate human review.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Agentic AI red teaming is now an identity governance problem, not just a model safety exercise. The article shows that the meaningful risk is not one bad prompt but a sequence of decisions, tool calls, and state reuse that can alter outcomes over time. That shifts the control question from “Is the model safe?” to “Is the delegated identity constrained enough to survive multi-step execution?” Practitioners should treat red teaming as a governance test for runtime authority.

Fine-grained delegation was designed for predictable execution windows. That assumption fails when an agent can change course mid-session, select tools dynamically, and act without a human approval gate between steps. The implication is assumption collapse: least privilege can no longer be defined only at provisioning time when the actor’s next move is not knowable in advance. Practitioners must rethink how privilege is bounded when decision-making happens inside the workflow.

Memory poisoning and planning manipulation expose a runtime governance gap. These are not classic access-control failures, because the agent may still be authenticated and authorised when the abuse occurs. The failure mode is that the system trusts retained context and planned next steps too readily, so malicious influence survives across turns. For identity teams, that means the boundary of control has moved from login to session behaviour.

Zero Trust for agents is becoming the practical baseline for autonomous-like workflows. The article’s emphasis on narrow scope, temporary access, and continuous validation aligns with OWASP-NHI and NIST CSF thinking, but the hard part is enforcing those controls at runtime. The important point is that broad, persistent delegation breaks down faster when software can reason, chain, and reuse tools. Practitioners should expect agent governance to converge with NHI governance.

Tool trust is now part of the attack surface. The idea that tools can be trusted because they are integrated is no longer stable once agents are allowed to discover, invoke, and chain them dynamically. That makes tool identity, provenance, and behaviour validation central to agent security. Practitioners should assume a trusted tool can become the easiest route to compromise if governance stops at the model boundary.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That blind spot matters because governance only works when access, action, and evidence remain visible across the full agent workflow, a theme explored in OWASP Agentic Applications Top 10.

What this signals

Runtime delegation is the next governance boundary. As agentic systems move from pilots into business processes, teams will need controls that operate at the same speed as the workflow, not at the cadence of periodic review. The practical issue is that access review models built for stable identities do not see a session that changes shape mid-execution.

Agent trust will become a measurable control surface. If tool provenance, memory reuse, and action logging are not explicit policy objects, teams will struggle to prove what an agent actually did. That is where NHI governance and AI governance start to overlap, because the actor is software but the accountability model still has to be auditable.

With 92% of organisations agreeing that governing AI agents is critical but only 44% having policies in place, according to AI Agents: The New Attack Surface report, most programmes are still early in the maturity curve. The next phase is likely to look less like model oversight and more like identity lifecycle management for software actors.

For practitioners

Red-team full agent workflows Test multi-turn paths that include tool use, memory reuse, follow-on decisions, and recovery behaviour after an initial benign request. Build scenarios that try to redirect the workflow after trust has already been established.
Scope delegated rights to a single task window Grant only the minimum access needed for one explicit job, then revoke it when the workflow ends. Avoid broad standing access that survives across sessions or can be reused by the agent for unrelated actions.
Validate tools before the agent can chain them Require provenance checks, structured interfaces, and behaviour validation for every tool an agent may call. Treat a newly trusted tool as a separate identity and policy decision, not as a harmless extension of the model.

Key takeaways

Agentic AI red teaming exposes control gaps that traditional prompt testing cannot see, because the real risk appears across chained actions and reused state.
The evidence base is already strong: agents frequently act beyond intended scope, and many organisations still cannot fully audit what they access or change.
The practical response is to constrain delegated authority, validate tools as identities, and test the whole workflow before agents are allowed into production.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agentic red teaming maps directly to prompt, tool, and workflow abuse risks.
NIST AI RMF		The article focuses on governance, measurement, and lifecycle risk for AI systems.
OWASP Non-Human Identity Top 10	NHI-03	Delegated access, scope control, and runtime identity all match NHI governance concerns.

Test multi-turn workflows for tool misuse, memory poisoning, and goal manipulation before production rollout.

Key terms

Agentic AI: Software that can decide and act across multiple steps to complete a goal. In identity terms, it behaves like a non-human actor with delegated authority, so governance must account for tool use, state reuse, and execution timing as part of access control.
Multi-turn red teaming: A testing method that evaluates how an AI system behaves across several linked interactions instead of a single prompt. For agentic systems, it reveals whether decisions, tools, and stored context can be manipulated over time into unsafe or unauthorised outcomes.
Fine-grained delegation: The practice of giving an identity only the specific rights needed for a narrow task and a limited duration. For agents, this has to be enforced at runtime, because broad or persistent delegation increases the chance that the software will reach beyond the intended workflow.
Tool poisoning: A compromise pattern where a tool that appears legitimate is used to influence an agent’s behaviour in a harmful way. The risk is not only technical contamination, but the agent treating a poisoned tool as trusted and chaining it into later decisions.

What's in the full article

TROJ.AI's full blog covers the operational detail this post intentionally leaves for the source:

The webinar discussion on multi-turn agent red teaming and how scenario depth changes the test design.
The specific mitigation patterns TROJ.AI associates with tool poisoning, memory manipulation, and over-permissive delegation.
The practical framing for checker agents, structured inputs, and logging requirements in production workflows.
The vendor’s breakdown of how to integrate red teaming earlier in the agent development cycle.

👉 The full TROJ.AI post covers multi-turn testing, mitigation patterns, and secure deployment guidance.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building or maturing an identity security programme, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-08-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org