AI systems can fail through interaction, retrieval, and probabilistic behaviour rather than only through code defects. A model may respond differently to hidden prompts, external content, or changing context, which makes static scanning insufficient. Security teams need adversarial testing because risk can emerge after deployment, not only during development.
Why This Matters for Security Teams
AI systems fail in ways traditional software testing often does not anticipate. A static code scan can catch a missing input validation check, but it will not reveal how a model behaves when a malicious prompt is embedded in retrieved content, or when an upstream tool returns unexpected data. That is why current guidance increasingly treats model behaviour as a security boundary, not just the application code around it.
This shifts testing away from only proving that code is syntactically safe and toward proving that the system remains trustworthy under adversarial interaction. The risk is amplified when AI systems can retrieve data, call tools, or chain actions across services. NIST’s NIST Cybersecurity Framework 2.0 is useful here because it frames risk management as an ongoing operational activity, not a one-time release gate. For AI-specific threat patterns, NHIMG’s DeepSeek breach analysis is a reminder that model exposure and downstream misuse can surface after deployment, not just during development.
In practice, many security teams encounter model abuse only after an agent has already retrieved data, shared secrets, or triggered an unintended tool action.
How It Works in Practice
Security testing for AI systems needs to exercise the full interaction path, not only the code path. That means red teaming prompts, testing retrieval-augmented generation flows, validating tool invocation boundaries, and checking whether the system leaks sensitive information when context changes. The goal is to discover how the system behaves when an attacker shapes inputs, sources, or execution order.
Practitioners should treat the model, retrieval layer, orchestration logic, and connected tools as one attack surface. A useful testing program usually includes:
- Prompt injection testing against direct prompts and retrieved documents
- Tool abuse testing, including overbroad function calls and unsafe defaults
- Data leakage checks for secrets, personal data, and policy-restricted content
- Permission testing for lateral movement across connected services
- Repeatability testing to see whether the same input produces inconsistent risk outcomes
Because AI behaviour is probabilistic, a single pass is not enough. Test cases should run with varying context, temperature, memory state, and tool availability. That is also why static scanning alone is insufficient: it can identify known bad patterns, but it cannot reliably predict emergent behaviour. OWASP’s Top 10 for Large Language Model Applications is a useful reference for categories such as prompt injection and insecure output handling, while the State of Non-Human Identity Security shows why connected identities and over-privileged access make these failures materially worse.
These controls tend to break down when the AI system has long-lived credentials and broad tool access because the test surface becomes too large to simulate exhaustively.
Common Variations and Edge Cases
Tighter AI security testing often increases operational overhead, requiring organisations to balance coverage against release speed and model churn. That tradeoff is real, especially for teams shipping many prompts, workflows, or agents at once.
There is no universal standard for this yet, but best practice is evolving toward risk-based testing. High-impact systems should receive deeper adversarial testing, while lower-risk internal assistants may rely on narrower checks. This is especially important where a model has access to secrets, customer data, or privileged actions, since failures can become security incidents rather than quality defects.
Edge cases also matter. Systems that look safe in a sandbox may fail once connected to live retrieval indexes, external APIs, or memory stores. Multi-agent workflows add another layer of complexity because one agent’s output can become another agent’s instruction. In those environments, testing should include context poisoning, unsafe delegation, and escalation through chained actions. OWASP’s LLM guidance and the DeepSeek breach case both reinforce the same point: AI testing must account for runtime behaviour, not just design intent.
Where models are updated frequently or use external tools with changing permissions, any fixed test suite will age quickly because the security properties change with the system itself.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | LLM-01 | Prompt injection and unsafe tool use are central to AI-specific testing. |
| CSA MAESTRO | MAESTRO-03 | Covers agentic workflow abuse and runtime control validation. |
| NIST AI RMF | AI risk management requires ongoing evaluation of model behaviour and impact. |
Test prompts, tools, and outputs together to catch adversarial behaviour before release.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org