Why do traditional red team exercises miss so many AI security issues?

Why Traditional Red Team Exercises Miss AI Security Issues

Traditional red team exercises are built to test fixed systems, but AI workloads are not fixed. An LLM can change its outputs through prompt variation, retrieved context, tool access, and chain-of-thought-adjacent workflows that evolve during normal use. That means a one-time exercise may prove a point about a known prompt injection path while missing the larger control gap: continuous runtime governance across model, data, and agent action paths.

This is why current guidance increasingly treats agentic systems as a different class of risk. The attack surface is not only the model endpoint; it also includes memory, orchestration, retrieval layers, secrets, and downstream actions. NHIMG’s Top 10 NHI Issues research highlights that weak visibility and over-privilege are recurring failure modes in machine identities, and those same patterns show up quickly once an AI agent can call tools on its own. In practice, many security teams encounter AI abuse only after a tool chain or data path has already been exercised in production, rather than through intentional pre-launch testing.

How Red Teaming Needs to Change for Agentic Systems

Agentic AI requires testing the behaviour of the system at runtime, not just the prompt surface. A useful exercise should emulate an autonomous workflow: planning, tool selection, retrieval, execution, error recovery, and escalation attempts. The best practice is evolving, but the current direction is clear. Security teams should combine adversarial prompting with policy evaluation, secret exposure checks, permission boundary tests, and logging review to see whether the agent can do damage even when the model response itself looks harmless.

That is where intent-based controls and just-in-time access matter. Static RBAC can work for human users, but it is often too coarse for agents whose actions depend on task context. Instead, controls should evaluate what the agent is trying to do at request time, then issue short-lived access only for that step. That approach aligns with runtime policy patterns discussed in the Anthropic Project Glasswing work and the CSA MAESTRO agentic AI threat modeling framework, both of which reinforce that agent risk is a control and orchestration problem, not only a model quality problem.

Practical tests should also include workload identity and secret handling. If an agent uses long-lived API keys, the red team is testing a brittle credential, not the real system. If the agent authenticates with workload identity and short-lived tokens, then the test can measure whether access is correctly constrained, revoked, and audited. These controls tend to break down in loosely governed multi-tool environments where the agent can chain benign permissions into an unintended privileged action.

Common Variations and Edge Cases

Tighter agent controls often increase operational overhead, requiring organisations to balance reduced blast radius against latency, integration effort, and developer friction. That tradeoff is real, especially in environments where agents must call many services in sequence or where human operators expect rapid experimentation. There is no universal standard for this yet, so teams should treat results from traditional red teaming as partial evidence, not a complete assurance signal.

Some AI systems are still narrow enough that conventional red teaming remains useful, especially for single-purpose copilots with no external tools and no persistent memory. But once the system retrieves internal data, writes to tickets, sends messages, or invokes cloud APIs, the exercise must expand to include permission boundaries and revocation paths. NHIMG’s DeepSeek breach coverage is a reminder that AI incidents are often about surrounding infrastructure and exposure pathways, not just model prompts. The same point is echoed in the Astrix Security & CSA findings on visibility gaps and over-privileged access, where weak control over machine identities undermines even well-intentioned testing. The hardest cases are hybrid environments where a human supervises an agent, but the agent still holds direct tool access and cached secrets.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A02	Covers prompt and tool abuse that traditional red teaming often misses.
CSA MAESTRO	T3	Frames agent risk as orchestration and threat-modeling, not just model testing.
NIST AI RMF		Supports governance of AI risks that appear only during live operation.

Test agent workflows at runtime, including tool use, retrieval, and escalation paths.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do traditional red team exercises miss so many AI security issues?

Why Traditional Red Team Exercises Miss AI Security Issues

How Red Teaming Needs to Change for Agentic Systems

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group