How should security teams test AI agents that can call tools and APIs?

Why This Matters for Security Teams

Testing AI agents is not the same as testing chatbots. A tool-using agent can transform a harmless prompt into an API call, a ticket update, a file move, or a credential lookup, so the security question is whether the agent can be induced to act outside intent. That is why current guidance from OWASP Agentic AI Top 10 and NIST AI Risk Management Framework focuses on execution pathways, not just model output quality.

This matters because autonomous systems can chain actions faster than a human reviewer can notice, and failures often show up as privilege misuse rather than obvious “unsafe” text. NHIMG research on OWASP NHI Top 10 and AI LLM hijack breach shows that agentic risk is usually about what the workload can reach, not what it says. In practice, many security teams discover this only after an agent has already touched a real tool, rather than through intentional test design.

How It Works in Practice

Effective testing starts by treating the agent as an autonomous workload with identity, permissions, and state. Security teams should build scenarios that force the agent through its actual tool interfaces, including MCP handlers, approval gates, and downstream APIs. Test cases should cover prompt injection, indirect instruction smuggling, malformed tool arguments, replayed requests, and state transitions where the agent is asked to re-plan after a failed call. The goal is to see whether the agent respects intent-based authorisation at runtime, not just whether it answers politely.

For agents that hold secrets or sign requests, the most useful control is short-lived access. JIT credentials and ephemeral secrets reduce the blast radius of a bad action, while workload identity gives the platform a cryptographic way to know what the agent is. That is why teams often pair policy evaluation with runtime identity assertions, drawing on models described in the CSA MAESTRO agentic AI threat modeling framework and the MITRE ATLAS adversarial AI threat matrix.

Validate that the agent can only call approved tools for the current task, not every tool it technically knows about.

Confirm approvals are evaluated per request, with context such as workload, destination, data class, and time window.

Verify secrets expire quickly and are revoked when the task ends or the agent deviates from scope.

Test failure paths where the agent retries, chains tools, or requests new permissions after a denial.

NHIMG coverage of the DeepSeek breach is a reminder that exposed secrets and broad access paths create immediate exposure, while the SailPoint AI Agents: The New Attack Surface report found 80% of organisations observed agents acting beyond intended scope. These controls tend to break down when agents are granted broad service credentials in legacy automation environments because the policy layer cannot keep up with autonomous retries and tool chaining.

Common Variations and Edge Cases

Tighter runtime control often increases test overhead, requiring organisations to balance deeper assurance against slower release cycles. That tradeoff is real, especially where agents operate across multiple services, human approval queues, or partially governed vendor tools. There is no universal standard for this yet, but best practice is evolving toward context-aware checks instead of static RBAC alone.

Edge cases include agents that act through shared service accounts, multi-agent workflows that hand off state, and environments where the tool call itself is the attack surface. In those cases, role-based access is too coarse because the same agent can be safe in one context and dangerous in another. Security teams should test whether policies adapt at request time and whether the agent can be constrained by purpose, data sensitivity, and task scope. For implementation patterns, OWASP Top 10 for Agentic Applications 2026 and NIST AI Risk Management Framework both support this shift toward runtime governance.

When agents are allowed to discover tools dynamically or inherit permissions from upstream jobs, testing should also include revocation timing, failure isolation, and recovery behaviour. The hardest failures are not model hallucinations but over-privileged automation that continues to act after it should have stopped.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A01	Agent tool abuse and unsafe execution paths map directly to agentic application risk.
CSA MAESTRO		MAESTRO guides threat modeling for autonomous agents, including tool and policy failures.
NIST AI RMF	GOVERN	AI RMF governance is relevant because agent testing needs accountable runtime oversight.

Assign ownership for agent behaviour and require governance checks on every privileged action.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams test AI agents that can call tools and APIs?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group