How should security teams govern AI agents that can choose tools at runtime?

Security teams should govern runtime agent choice as an access event, not as a simple application action. That means scoping permissions to the task, limiting token lifetime, logging every tool decision, and blocking the agent from reaching systems outside its approved context. Static roles alone are not enough when the execution path changes on each run.

Why This Matters for Security Teams

AI agents that choose tools at runtime are not just another application tier. They are autonomous decision-makers with the ability to request data, trigger workflows, and chain actions across systems. That makes governance a runtime access problem, not a static entitlement problem. Security teams that rely on RBAC alone miss the central risk: the agent’s next move is not fully knowable in advance, so the control must evaluate intent, context, and scope at the moment of use.

This is why current guidance increasingly points toward zero standing privilege, short-lived credentials, and policy decisions that follow the task rather than the account. The OWASP NHI Top 10 and OWASP Agentic AI Top 10 both reflect this shift toward tool-use abuse, scope drift, and unintended action paths. In practice, many security teams encounter the problem only after an agent has already touched an inappropriate system or exposed a secret, rather than through intentional design review.

That gap is not theoretical. SailPoint reports that 80% of organisations have seen AI agents perform actions beyond their intended scope, including unauthorised system access, data sharing, and credential exposure, which makes runtime governance an operational necessity rather than a future concern.

How It Works in Practice

Governing tool-using agents starts with treating the agent as a workload identity, not a human user. The agent should authenticate with a cryptographic identity, then receive NIST AI Risk Management Framework-aligned policy decisions at request time. In mature designs, the agent does not hold broad standing access. Instead, it receives just-in-time credentials for a single task, with automatic revocation when the task ends. That reduces the value of stolen tokens and limits lateral movement if the agent is compromised.

Security teams should separate three layers:

Identity: prove which agent instance is acting, using workload identity patterns such as SPIFFE/SPIRE or OIDC-backed service tokens.
Authorisation: evaluate what the agent is trying to do right now, using intent-based or context-aware policy rather than fixed role membership.
Secrets: issue short-lived tokens or ephemeral API keys tied to task scope, not reusable long-lived credentials.

At the control plane, policy-as-code should decide whether a tool call is permitted, whether the requested data class is in scope, and whether the current run context matches an approved objective. This is where guidance from the CSA MAESTRO agentic AI threat modeling framework is useful: it encourages teams to map tool access, escalation paths, and cross-system dependencies before an agent is allowed to act. For deeper operational patterns, NHIMG’s Analysis of Claude Code Security and Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs show why lifecycle controls matter for non-human identities that can spawn, mutate, and expire dynamically.

Telemetry must capture the decision, the tool, the context, the prompt or task objective, and the credential used, so investigators can reconstruct whether the agent acted within its approved intent. These controls tend to break down when agents are allowed to call legacy systems that cannot evaluate request-time policy or issue short-lived credentials.

Common Variations and Edge Cases

Tighter runtime control often increases orchestration overhead, so organisations have to balance safety against latency, developer friction, and operational complexity. That tradeoff is real, especially in high-volume environments where agents call many tools per minute. There is no universal standard for this yet, but best practice is evolving toward layered controls rather than a single gate.

One common edge case is delegated tool access. If an agent can call another agent or service on behalf of a user, the first agent should not inherit unlimited downstream privileges. Each hop should be re-authorised, and each credential should remain task-bound. Another edge case is data-rich environments where an agent needs broad search access but narrow write access. In those cases, context-aware policy should distinguish between read, transform, and execute actions instead of applying one role to all tool calls.

Agents with long-running goals also need periodic re-evaluation. A task that was safe at startup may become unsafe once the context changes, especially if the agent discovers new tools or receives additional instructions mid-run. That is why runtime policy evaluation, short-lived secrets, and explicit revocation are more reliable than standing grants. NHIMG’s Top 10 NHI Issues and the AI LLM hijack breach material are useful reminders that compromise often begins with credential reuse, overbroad scope, or missing auditability. The practical rule is simple: if the agent can decide its own next step, the control model must decide its access before that step occurs.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A5	Tool-use abuse and runtime scope drift are central agentic risks.
CSA MAESTRO	MTR-4	MAESTRO addresses threat modeling for agent workflows and delegated tools.
NIST AI RMF	GOVERN	AI RMF governance covers accountability for autonomous agent decisions.

Assign ownership for agent actions and require reviewable policy for each runtime decision.

How should security teams govern AI agents that can choose tools at runtime?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group