Subscribe to the Non-Human & AI Identity Journal

How should security teams red team non-deterministic AI systems?

They should test AI systems continuously across design, pre-release, and post-deployment phases, because behaviour can drift after model updates or tool changes. The test cases need to include prompt variation, indirect injection, and realistic tool chains so the team measures what the system actually does, not only what it says.

Why This Matters for Security Teams

Red teaming non-deterministic AI systems is fundamentally different from testing conventional software because the same input can produce different outputs, tool calls, and side effects. Security teams need to measure behavioural risk, not just content safety, which means probing prompt injection, indirect injection through documents or web content, and unexpected tool chaining. NIST’s NIST AI 600-1 GenAI Profile and NIST Cybersecurity Framework 2.0 both support continuous risk management, which fits this problem better than one-time validation.

The practical failure mode is that teams often approve a model based on sandbox prompts, then deploy it into workflows where it can browse, retrieve, write, or execute. That creates a gap between what the system was tested to say and what it is operationally allowed to do. NHIMG’s Ultimate Guide to NHIs — Standards is useful here because agentic systems are only as safe as the identities, credentials, and tool permissions behind them. In practice, many security teams encounter abuse only after a prompt chain has already triggered external actions, rather than through intentional pre-release discovery.

How It Works in Practice

Effective red teaming for non-deterministic AI systems should be staged across design, pre-release, and post-deployment, with the scope expanding as the system gains tools and data access. Test cases should vary phrasing, intent, language, and ordering to expose brittle policy enforcement, then add indirect injection through emails, tickets, PDFs, knowledge bases, and web pages. The goal is to see whether the model can be steered into unsafe reasoning, unsafe disclosure, or unsafe action.

For agentic systems, the highest-value tests are usually end-to-end. That means exercising the full chain from prompt to planning to retrieval to tool use to output, while logging each decision point. Where possible, security teams should simulate adversarial context rather than isolated prompts. NIST’s NIST IR 8596 Cyber AI Profile is a useful reference for adversarial thinking, while the DeepSeek breach case reinforces why exposed secrets, embedded credentials, and uncontrolled data can turn model testing into a security incident.

  • Use prompt variation to test the same objective through multiple phrasings and languages.
  • Test indirect injection using documents, tickets, chats, and retrieval sources the model trusts.
  • Exercise realistic tool chains, including search, write, send, and approval workflows.
  • Measure actual actions taken, not only the text the model returns.
  • Retest after model updates, connector changes, prompt changes, and policy changes.

These controls tend to break down when the AI system is tightly integrated with live business tools but logging is incomplete, because the team cannot reconstruct which prompt, retrieval result, or tool output caused the unsafe action.

Common Variations and Edge Cases

Tighter red-team coverage often increases operational cost and can slow release cycles, requiring organisations to balance confidence against delivery speed. That tradeoff is especially sharp for systems with external memory, autonomous retries, or delegated tool use, where every added capability expands the attack surface.

Current guidance suggests different test depths by risk tier, but there is no universal standard for this yet. A simple chat assistant may only need prompt-injection and disclosure testing, while an agent with email, file, or payment access needs scenario-based abuse cases, permission escalation checks, and rollback verification. Teams should also retest when the underlying foundation model changes, because non-deterministic behaviour can shift even if the application code does not.

One common edge case is simulated versus production parity. If the red-team environment lacks real retrieval corpora, production prompts, or valid credentials, the results can look safer than the live system. Another is overfitting to jailbreak strings, which misses contextual abuse and workflow abuse. The better approach is to combine static policy checks with live adversarial exercises and to treat any new tool connector as a new security boundary. The State of Non-Human Identity Security shows why this matters operationally: identity and access gaps remain a major source of real-world exposure, so AI red teaming should always include the permissions behind the model as well as the model itself.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 LLM-03 Prompt injection and tool abuse are central to red teaming non-deterministic agents.
CSA MAESTRO A1 MAESTRO addresses autonomous agent risk across planning, tools, and execution.
NIST AI RMF AI RMF supports continuous measurement and risk management for nondeterministic systems.

Test prompt, retrieval, and tool paths together to expose unsafe agent behaviour before release.