An adversarial sequence that pressures an AI system across several interactions instead of a single prompt. This matters because many agent failures emerge gradually, as context, memory, and tool use interact over time and steer the system away from its intended purpose.
Expanded Definition
A multi-turn attack is an adversarial sequence that evolves over several exchanges, using prior context, memory, and tool access to steer an AI system toward unsafe behavior. Unlike a single prompt injection, it relies on persistence and gradual shaping, which is why guidance in the OWASP NHI Top 10 and the MITRE ATLAS adversarial AI threat matrix both treat session-level behavior as security-relevant. In agentic systems, this pattern can manipulate memory retention, task decomposition, policy interpretation, and tool invocation without ever looking overtly malicious in any single turn.
Definitions vary across vendors on whether the attack requires intent to exfiltrate secrets, induce harmful tool use, or simply degrade alignment across a conversation. NHI Management Group treats the term broadly: any multi-step interaction that compounds risk across context windows, retained state, or delegated actions qualifies. This matters because the attack surface is not just the model response, but also the surrounding identity and access posture that enables the agent to remember, retrieve, and act. The most common misapplication is treating each user message as an isolated event, which occurs when defenders ignore cumulative manipulation across a session.
Examples and Use Cases
Implementing multi-turn defenses rigorously often introduces state-management overhead, requiring organisations to weigh smoother user experience against stricter monitoring, shorter memory retention, and more conservative tool permissions.
- An attacker starts with harmless operational questions, then gradually shifts the agent toward revealing internal workflow details or hidden instructions.
- A prolonged chat causes the agent to treat an earlier untrusted instruction as a standing objective, especially when memory or summaries are reused across turns.
- A workflow agent is nudged over several turns into approving an external action that looks legitimate only because each step is individually plausible.
- Researchers studying NHI compromise often map this pattern alongside the 52 NHI Breaches Analysis to show how identity abuse and conversational manipulation can reinforce each other.
- Incident response teams compare these sequences with CISA cyber threat advisories when adversaries combine social engineering, credential abuse, and AI orchestration in one campaign.
These cases are especially relevant when the agent has access to secrets, internal APIs, or delegated approvals, because the attack can unfold without a single obvious red flag.
Why It Matters in NHI Security
Multi-turn attacks become more dangerous when the system’s non-human identities are overprivileged, long-lived, or poorly monitored. NHIMG reports that 97% of NHIs carry excessive privileges, which means a conversation-level weakness can quickly turn into a material access event. When an AI agent can retain state, call tools, and reuse credentials, the attacker does not need to win in one prompt; they only need to keep nudging the system until it crosses a boundary. That is why the issue sits at the intersection of identity governance, prompt safety, and operational monitoring, not just model content filtering.
The practical lesson is that multi-turn attacks often reveal themselves only after logs show an unexpected approval, data retrieval, or external call sequence. By then, the compromise may already involve broader NHI exposure patterns, including leaked secrets and weak offboarding controls documented in NHIMG research. Organisations typically encounter the consequences only after an agent has already executed a risky action, at which point multi-turn attack analysis becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and MITRE ATLAS address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | LLM-01 | Agentic apps are vulnerable when malicious instructions unfold across multiple turns. |
| MITRE ATLAS | ATLAS covers adversarial tactics that manipulate AI behavior over time. | |
| NIST AI RMF | AIRMF addresses evolving AI risks, including persistent manipulation across sessions. |
Track conversational state and block cumulative instruction drift before tool execution.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 9, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org