AI agents can choose tools and chain actions, so the risk is not just harmful output. The governance problem becomes delegated execution, where an agent can misuse allowed access, combine steps unexpectedly, or continue beyond the intent of the original request. That is why red teaming must cover tool use and action paths, not only prompts.
Why This Matters for Security Teams
AI agents do not just generate content; they can initiate actions, invoke tools, and keep pursuing a goal after the original prompt has ended. That changes red teaming from testing for unsafe text into testing for delegated execution. Security teams need to understand whether an agent can overreach through valid credentials, chain permitted steps into an unintended outcome, or expose secrets while still appearing to “work as designed.” Current guidance from the OWASP Agentic AI Top 10 and NIST AI Risk Management Framework makes clear that runtime behavior, not just model output, is the real risk surface. NHIMG research shows this is not theoretical: in SailPoint’s AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already performed actions beyond intended scope. In practice, many security teams encounter agent misuse only after tool access has already been granted and an action trail has been created, rather than through intentional red-team discovery.How It Works in Practice
Ordinary AI applications are usually assessed like decision systems: what did the model say, and did it leak or hallucinate? Agentic systems need a different test plan because the model may choose between tools, sequence those tools, and adapt its path based on intermediate results. That means red teaming should examine the full action chain: prompt injection, tool selection, credential use, data retrieval, lateral movement, and post-task continuation. NHIMG’s OWASP NHI Top 10 and the CSA MAESTRO agentic AI threat modeling framework both point toward the same operational shift: treat the agent as a workload with an identity, not as a static application instance.That leads to practical controls that red teams should challenge directly:
- Does the agent get just-in-time, task-scoped credentials, or does it hold long-lived secrets?
- Is authorisation evaluated at runtime based on intent, context, and target resource?
- Can the agent prove workload identity through mechanisms such as OIDC or SPIFFE/SPIRE rather than shared API keys?
- Are tool calls logged in a way that supports full reconstruction of action paths?
- Can policy stop a chain of individually allowed actions that becomes unsafe in aggregate?
This is where static RBAC often fails. A role can say what an agent may access in general, but it cannot always anticipate what the agent will try next, especially when the goal is ambiguous or the environment changes mid-task. Best practice is evolving toward policy-as-code, short-lived secrets, and intent-based authorisation evaluated at request time, not at deployment time. These controls tend to break down when agents operate across multiple SaaS tools and legacy systems because identity, logging, and policy enforcement are usually fragmented across boundaries.
Common Variations and Edge Cases
Tighter control over agent actions often increases operational overhead, so organisations must balance safety against speed, cost, and developer friction. That tradeoff matters most in high-autonomy deployments where an agent can act continuously, such as code assistants, SOC copilots, or procurement workflows. In those environments, there is no universal standard for how much autonomy is acceptable, so teams should document risk thresholds rather than assume one policy fits all. The NIST AI Risk Management Framework and OWASP Top 10 for Agentic Applications 2026 are useful here because they both encourage governance around context, accountability, and misuse paths rather than relying on a single permission model.Two edge cases deserve special attention. First, agents that can call external tools over MCP or similar connectors may be safe in isolation but unsafe when chained across systems, because each step inherits the authority of the previous one. Second, long-lived secrets create a different failure mode from human accounts: once exposed, they can be reused far beyond the original task window, which is exactly why JIT provisioning and rapid revocation matter. NHIMG’s AI LLM hijack breach and DeepSeek breach show how quickly exposed secrets and overbroad access can become real incidents. The practical takeaway is simple: red teaming for agents must test the identity, the policy decision, and the entire execution path, not just whether the model can be tricked into saying something harmful.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Agentic attack paths and tool misuse are the core red-team concern. |
| CSA MAESTRO | MAESTRO frames threat modelling for autonomous agent workflows. | |
| NIST AI RMF | AI RMF supports governance for autonomous, goal-driven AI behavior. |
Define accountability, monitor behavior, and manage agent risk continuously.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org