Subscribe to the Non-Human & AI Identity Journal

How should security teams test LLMs for chained attack paths?

Security teams should test the full interaction chain, not just isolated jailbreak prompts. That means combining prompt injection, retrieval abuse, memory persistence, and tool-calling scenarios in one campaign so the team can see how hostile input compounds across a session. The goal is to find where state changes, not only where a response looks unsafe.

Why This Matters for Security Teams

Chained attack paths are the real failure mode in LLM security because attackers rarely need a single perfect jailbreak. They combine prompt injection, retrieval poisoning, memory abuse, and tool invocation until one unsafe step turns into a broader compromise. That is why test campaigns need to model the sequence, not just the prompt. Guidance from the OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both point toward contextual, runtime evaluation rather than static content filtering.

For NHI Management Group, this is the same security pattern seen in broader non-human identity failures: the dangerous step is usually not the first credential or the first prompt, but the downstream trust chain. Research such as AI LLM hijack breach shows how compromised identities and exposed secrets can be operationalised quickly once an attacker reaches an execution path. Chained testing helps teams see whether the model can be induced to retrieve sensitive context, persist malicious instructions, or trigger an action that should have been blocked. In practice, many security teams discover the real issue only after the model has already crossed from unsafe text into unsafe state changes.

How It Works in Practice

Effective testing treats the LLM or agent as a stateful system with multiple trust boundaries. A useful campaign starts with benign-looking instructions, then introduces hostile content through one channel and checks whether the model later carries that intent into another channel, such as retrieval, memory, code execution, ticketing, or API calls. This is where agentic systems differ from simple chat interfaces: the risk is not just output quality, but whether the model can be driven to act across a chain of tools and permissions.

Teams should build scenarios that combine at least four elements:

  • Prompt injection that attempts to override system or developer instructions.
  • Retrieval abuse that plants poisoned context in documents, tickets, or knowledge bases.
  • Memory persistence that tests whether bad instructions survive across turns or sessions.
  • Tool-calling that checks whether the model can convert untrusted text into privileged actions.

Map each step to a visible state transition. For example, test whether an injected instruction is ignored in chat but later resurfaced in retrieval, or whether a harmless-seeming request causes the agent to query a sensitive source. The important question is not “did the answer look unsafe?” but “did the model change what it believes, stores, or does?” That aligns with the operational guidance in the OWASP NHI Top 10 and the CSA MAESTRO agentic AI threat modeling framework, both of which emphasise compound risk across workflows. The same mindset also appears in MITRE ATLAS adversarial AI threat matrix, where attacks are framed as staged techniques rather than isolated prompts. These controls tend to break down when the LLM is wired to multiple external systems with weak logging and no session-level policy enforcement, because the chain becomes invisible once the first tool call succeeds.

Common Variations and Edge Cases

Tighter chain testing often increases test volume and operational overhead, so security teams have to balance depth against the time required to reproduce complex sessions. Current guidance suggests prioritising the highest-risk paths first, especially where the model can reach secrets, production APIs, or write-capable tools. There is no universal standard for this yet, but best practice is evolving toward scenario-based testing that mirrors attacker workflow rather than single-turn adversarial prompts.

Edge cases matter. A chatbot with no tool access may still be vulnerable to memory poisoning or retrieval abuse, while an autonomous agent with broad permissions may fail only after several apparently harmless steps. Multi-agent systems add another wrinkle: one compromised agent can feed another poisoned context, making the chain longer and harder to detect. This is also where NHI controls intersect with agent testing, because the model may be operating through service accounts or short-lived credentials that need to be exercised during the test, not assumed safe by design. The 52 NHI Breaches Analysis and NIST AI 600-1 Generative AI Profile are useful references when teams need to connect model behavior to identity and governance failures. In the real world, the hardest cases are systems that look safe in isolated prompt tests but fail once a malicious instruction survives long enough to reach a tool with real authority.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A1 Chained attacks exploit prompt injection and tool misuse across agent flows.
CSA MAESTRO T1 MAESTRO models agentic threats as staged, compound attack paths.
NIST AI RMF GOVERN AI RMF governance frames accountability for runtime AI risk testing.

Test full agent workflows for prompt injection, memory abuse, and unsafe tool execution.