TL;DR: LLM security incidents are rising as models move into production, with one source estimate putting AI-related security incidents at 73% of enterprises in the last 12 months, while red teaming is shifting from single-prompt testing to multi-step chained attacks according to ZioSec. That shift means conventional evaluation programmes are no longer enough when agents can combine prompts, tools, and integrations across one attack path.
At a glance
What this is: This is a practitioner guide to LLM red teaming that argues simple prompt testing is no longer sufficient because deep chained attacks now drive higher-risk exposure.
Why it matters: It matters because IAM, PAM, and security teams increasingly have to govern AI systems as identity-bearing actors, not just applications with prompts.
By the numbers:
- 73% of enterprises experienced at least one AI-related security incident within 12 months.
- The global AI Red Teaming market size reached USD 1.12 billion globally in 2024 and is projected to reach USD 15.18 billion by 2033.
- Organizations that use AI and automation extensively for security experienced average breach costs of $3.84 million, while those that do not use AI saw costs surge to $5.72 million.
- LLM market projected to reach $82.1 billion by 2033.
👉 Read ZioSec's guide to LLM red teaming, attacks, and chained methods
Context
LLM red teaming is the controlled simulation of attacks against language models, their prompts, and their surrounding integrations. The governance gap is simple: traditional application testing assumes a stable request-response system, while LLM deployments can be steered, conditioned, or chained into unintended actions across multiple steps.
For IAM and security teams, the important shift is that LLMs increasingly behave like identity-adjacent control planes. Once a model can reach tools, data, or downstream workflows, the issue is not just output quality. It becomes a question of runtime authority, guardrails, and how much damage a malicious prompt chain can cause before detection.
Key questions
Q: How should security teams test LLMs for chained attack paths?
A: Security teams should test the full interaction chain, not just isolated jailbreak prompts. That means combining prompt injection, retrieval abuse, memory persistence, and tool-calling scenarios in one campaign so the team can see how hostile input compounds across a session. The goal is to find where state changes, not only where a response looks unsafe.
Q: Why do tool-connected LLMs create governance risk for IAM teams?
A: Tool-connected LLMs create governance risk because they can turn language into action across permissioned systems. Once the model can reach tickets, code, data, or infrastructure, the question becomes who authorized the action path and how much side effect that path can create before review. That is an identity and delegation issue, not only a model-safety issue.
Q: What breaks when organisations rely on single-prompt red teaming alone?
A: Single-prompt red teaming misses cumulative abuse. Many LLM failures emerge only after multiple turns, when an attacker uses one exchange to shape context, a second to refine instructions, and a third to trigger a tool or workflow. Without chain testing, teams see isolated behaviour but miss the real attack surface.
Q: How should teams decide when an LLM needs approval before acting?
A: Teams should require approval whenever the model can change state, move data, or trigger an external workflow that cannot be safely reversed. If the action would be sensitive when performed by a human operator or service account, the same standard should apply to the model. Approval gates should follow impact, not prompt length.
Technical breakdown
Evaluations, attacks, and deep chained methods in LLM red teaming
LLM red teaming splits into three distinct modes. Evaluations measure whether a model meets predefined safety, quality, or formatting expectations. Attacks attempt to break those expectations directly, often through jailbreaks or prompt injection. Deep chained methods go further by combining multiple steps, multiple prompts, and often external tools or linked systems to create an attack path that single-probe testing will miss. That distinction matters because chained abuse is less about one bad prompt and more about cumulative state change across a session. In practice, security teams need to understand where the model can retain context, invoke tools, or pass control downstream.
Practical implication: Test the full interaction chain, not only isolated prompts or one-off jailbreak cases.
Prompt injection and authorization bypass in agent-connected LLMs
Prompt injection is an input-manipulation technique where hostile content changes the model’s behaviour, usually by overriding instructions or smuggling new ones into context. In agent-connected systems, the risk grows when the model can call tools, query data, or trigger workflows after interpreting that injected content. Authorization bypass is the next step: the model acts outside the intended scope because the security model treats model output as trustworthy enough to execute. That creates an identity problem as much as an application problem. The model is not just generating text. It is participating in decisions that may carry permissioned side effects.
Practical implication: Treat tool-calling LLMs as governed execution paths and restrict what any single prompt can cause.
Why chained LLM exploits need control-plane thinking
Deep chained methods exploit the relationships between prompts, memory, tools, and downstream services. The attacker does not need a single perfect prompt if they can gradually build state, redirect the model, and use one successful step to set up the next. That is why control-plane thinking is essential: teams must map what actions the model can initiate, what state it can retain, and where human approval is still required. This is especially important in RAG and agent workflows, where one compromised step can alter later retrievals or actions. The architectural weakness is not only model output, but ungoverned interaction paths.
Practical implication: Define explicit approval boundaries for every model action that can change state or reach external systems.
Threat narrative
Attacker objective: The attacker wants to turn a seemingly safe LLM interaction into a trusted execution path that leaks data or performs unauthorized actions.
- Entry occurs when a malicious prompt, injected document, or manipulated context reaches the model through a chat, RAG, or agent workflow.
- Credential or authority abuse follows when the model accepts the hostile instruction and uses its connected tool or workflow permissions to act on it.
- Impact occurs when chained model actions leak data, trigger unintended commands, or bypass intended human review across connected systems.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Deep chained LLM abuse exposes a control-plane problem, not just a content-safety problem. Once a model can influence tools, memory, or downstream workflows, the failure mode moves beyond unsafe text generation. The real issue is that decision paths become programmable through language, which means normal application testing underestimates the blast radius. Practitioners should treat model-connected workflows as governed execution surfaces, not chat interfaces.
Prompt injection works because organisations still trust model-adjacent context too much. The article’s core warning is that hostile instructions can arrive through prompts, retrieval content, or chained interactions and still be treated as legitimate context. That is a governance failure because the boundary between data and instruction is no longer stable. The implication is that teams must stop assuming context is neutral when the model can act on it.
Deep chained methods create an identity problem for AI systems even when no human account is directly compromised. The model can inherit authority from connected tools, then reuse that authority across multiple steps without the kind of fixed session pattern conventional review processes expect. That makes the control gap broader than red teaming alone. Practitioners should recognise that authorisation for LLMs must be measured by what they can do end to end, not by what a single prompt appears to request.
LLM red teaming is becoming a governance discipline for autonomous-style behaviour, not just a testing tactic. The article compares evaluation, offensive testing, and chained attacks as complementary methods, but the industry implication is that these methods now map to different maturity levels in AI governance. Static test suites catch known failure patterns. Chained-method testing exposes where security assumptions break once the system can coordinate across multiple actions. Security teams should align red teaming with control ownership, not treat it as a one-off QA exercise.
Deep chained attack paths sharpen the need for a named concept: context-chain privilege. This is the cumulative authority an LLM gains when multiple prompts, retrieval results, and tool actions are allowed to compound within one session. The concept matters because the risk is not any single permission in isolation. It is the way apparently small allowances combine into an exploitable execution path. Practitioners should measure the combined authority of the chain, not only the permissions of each step.
From our research:
- 91.6% of secrets remain valid five days after the targeted organisation is notified, showing a critical gap in remediation procedures, according to the Ultimate Guide to NHIs.
- Also from our research: Only 20% of organisations have formal processes for offboarding and revoking API keys, and even fewer have procedures for rotating them, according to the Ultimate Guide to NHIs.
- For the next step: Read the OWASP NHI Top 10 for the control patterns that matter when model-driven actions move from testing into production.
What this signals
Context-chain privilege: LLM governance now needs a term for the cumulative authority a model accrues across prompts, retrieval, and tool calls within one session. That is where the practical risk sits, because the model can become more capable as the interaction continues. Teams should plan controls around end-to-end action paths, not individual prompts.
The broader signal is that red teaming is becoming a control-validation exercise for model-connected identity surfaces. When a model can act through delegated access, the old distinction between application testing and access governance starts to blur, and that should change how security, IAM, and AI teams share ownership.
If your environment already struggles to revoke non-human credentials promptly, the margin for AI-connected workflows is even thinner. With 91.6% of secrets still valid five days after notification, according to the Ultimate Guide to NHIs, delayed remediation becomes a direct path from prompt abuse to sustained access.
For practitioners
- Map model-to-tool authority chains Inventory every place the model can retrieve data, invoke tools, or trigger downstream workflows. Mark where output becomes action, where human approval is required, and where a single prompt can initiate more than one side effect.
- Test for chained prompt injection Build red-team cases that combine multiple prompts, retrieval inputs, and context updates instead of only single-shot jailbreaks. Include scenarios where the model preserves hostile instructions across turns or uses them after a delayed tool call.
- Limit blast radius by design Separate read-only model interactions from state-changing workflows. Use strict tool allowlists, short-lived authorisations, and explicit approval gates before any external action that can affect records, tickets, code, or infrastructure.
- Treat agent-connected LLMs as governed identity surfaces Review whether the model is inheriting credentials, tokens, or delegated access that were designed for humans or background services. Re-certify those permissions against the actual actions the model can trigger today, not the original use case.
Key takeaways
- LLM red teaming has moved beyond single-prompt failure hunting because chained attacks can compound across context, tools, and sessions.
- The governance problem is not only unsafe output, but model-connected authority that can trigger real actions in downstream systems.
- Security teams should test the full interaction chain, then limit model authority with approval gates, tool allowlists, and explicit action boundaries.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agent-connected LLM attacks map directly to chained prompt and tool-abuse risks. | |
| OWASP Non-Human Identity Top 10 | NHI-03 | Model credentials and delegated access create classic non-human identity exposure. |
| NIST CSF 2.0 | PR.AC-4 | Delegated model access must be governed like any other access entitlement. |
Review non-human credentials used by LLM workflows and shorten privilege duration wherever possible.
Key terms
- Deep Chained Method: A deep chained method is a multi-step attack that uses several prompts, context updates, or tool actions to produce an outcome that one prompt alone would not achieve. In LLM security, the risk is cumulative behaviour across a session, not a single malicious instruction.
- Prompt Injection: Prompt injection is an attack in which hostile text changes how a model behaves by overriding or redirecting its instructions. It becomes more dangerous when the model can act on the injected content through tools, retrieval, or downstream workflows.
- Context-Chain Privilege: Context-chain privilege is the total effective authority an LLM gains when prompts, retrieved content, memory, and tool access compound within one interaction. The concept matters because small permissions can combine into a larger execution path that looks harmless at each individual step.
- Agent-Connected LLM: An agent-connected LLM is a model that can do more than generate text because it can reach tools, data sources, or other workflows. That connectivity turns model governance into a delegated access problem, where output quality and action authority both need control.
Deepen your knowledge
LLM red teaming, prompt injection, and delegated model authority are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for agent-connected systems, it is a practical place to start.
This post draws on content published by ZioSec: LLM Red Teaming: Evaluations, Attacks, & Deep Chained Methods - Ziosec, Mindgard, Promptfoo Compared. Read the original.
Published by the NHIMG editorial team on 2026-02-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org