Subscribe to the Non-Human & AI Identity Journal

Agentic Red Teaming

Agentic red teaming is the practice of testing AI systems through their real runtime paths, including tools, memory, UI rendering, and downstream workflows. It evaluates how an agent behaves in production, not just how a model responds to prompts, and it should surface actionable exploit chains, not isolated prompt failures.

Expanded Definition

Agentic red teaming goes beyond prompt-only testing and evaluates the full execution path an AI agent can take, including tool calls, memory retrieval, browser or UI actions, and downstream workflow effects. In NHI and agentic ai governance, that distinction matters because the real security boundary is often the agent’s authority, not the model’s text output.

Usage in the industry is still evolving. Some teams treat agentic red teaming as a security assessment, while others use it for safety testing, abuse simulation, or validation of guardrails. The most useful interpretation is operational: test how an agent behaves when it can act, persist state, and chain decisions across systems. That aligns closely with the risk framing in the OWASP Agentic AI Top 10 and the broader control approach in the NIST AI Risk Management Framework.

At NHI Management Group, this term is best understood as a way to expose exploit chains that only appear when identities, secrets, tools, and permissions are combined in a live workflow. The most common misapplication is treating it as a one-time prompt jailbreak exercise, which occurs when testers ignore tool permissions, session state, and downstream side effects.

Examples and Use Cases

Implementing agentic red teaming rigorously often introduces environment complexity and safety constraints, requiring organisations to weigh realistic attack simulation against the risk of triggering harmful actions in test or production-like systems.

  • Testing whether an AI support agent can be induced to retrieve restricted customer records through a benign-looking tool sequence.
  • Validating whether memory persistence lets an attacker plant instructions that later alter the agent’s behavior across sessions.
  • Checking whether a browser-enabled agent can be steered into approving a malicious external request or downloading an unsafe file.
  • Simulating privilege escalation paths that begin with a harmless prompt but end in unauthorized workflow execution, such as ticket creation, refunds, or data export.
  • Comparing behavior across hardened and non-hardened deployments to confirm whether policy, sandboxing, and approvals actually constrain execution.

These scenarios map directly to the attack surface described in AI Agents: The New Attack Surface report, where agent behavior beyond intended scope is already being observed in real organisations. They also complement implementation guidance in CSA MAESTRO agentic AI threat modeling framework, which emphasizes tooling, orchestration, and control boundaries rather than model output alone.

Agentic red teaming is especially valuable when an organisation needs to assess whether identity controls, such as scoped tokens and step-up approvals, survive real misuse paths inside a production workflow.

Why It Matters in NHI Security

Agentic red teaming is important because NHIs often provide the permissions, tokens, and service trust that make agent abuse possible. When an agent can call APIs, inherit sessions, or trigger backend workflows, a single compromise can cascade into data exposure, unauthorized transactions, or lateral movement. That is why NHI testing cannot stop at secret scanning or static permission reviews.

The need is not theoretical. NHIMG research shows that in one study, 80% of organisations reported AI agents had already performed actions beyond their intended scope, and only 52% could track and audit the data their AI agents access. Those gaps make it difficult to investigate abuse, attribute harmful actions, or prove that controls were effective. The related LLMjacking and Moltbook AI agent keys breach coverage shows how quickly exposed credentials and weak governance can turn into active compromise.

Practitioners should use red team findings to tighten least privilege, isolate tool access, instrument logging, and enforce human approval where needed. Organisationally, the term becomes unavoidable only after an agent has already accessed data, triggered a workflow, or revealed credentials outside its intended scope.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A10 Covers agent-specific risks from tool use, orchestration, and hidden execution paths.
NIST AI RMF Frames AI risk testing around context, impacts, and trustworthy system behavior.
CSA MAESTRO Models agentic threats across orchestration, tools, and security boundaries.

Assess agent behavior in live workflows and document residual risk before deployment.