How should security teams govern AI agents that can invoke multiple tools in one session?

Security teams should govern AI agents as decision-making identities, not just tool users. That means defining tool access, context scope, and escalation limits together, then monitoring the full execution chain for unexpected combinations of actions. If those controls are split across teams or policies, the agent can move faster than review cycles and create impact before anyone intervenes.

Why This Matters for Security Teams

AI agents that can call multiple tools in one session are not simple application users. They are autonomous software identities that can chain actions, retain context, and amplify a small permission gap into a broad incident. Static RBAC is useful for humans, but it often fails for goal-driven agents because the next action depends on live context, not a fixed job description. Current guidance from the OWASP Agentic AI Top 10 and CSA MAESTRO agentic AI threat modeling framework treats this as an execution-control problem as much as an access-control problem.

The risk is not just tool abuse. It is tool composition: an agent can search, retrieve, transform, and exfiltrate in a single run before a human review step ever appears. That is why NHI governance for agents must include workload identity, short-lived secrets, and policy decisions made at request time, not only perimeter approval. NIST frames this in risk terms through the NIST AI Risk Management Framework, while NHIMG research shows how often agent behaviour already exceeds scope in practice. In the SailPoint report on AI Agents: The New Attack Surface report, 80% of organisations said their agents had already performed actions beyond intended scope.

In practice, many security teams discover agent overreach only after the workflow has already combined actions in a way no reviewer expected.

How It Works in Practice

The most reliable pattern is to govern the agent as an identity plus an execution policy. The identity side should rely on workload identity, not shared service accounts, so the system can prove what the agent is through cryptographic credentials rather than assume who is behind the prompt. The policy side should evaluate every tool call against intent, context, and risk at runtime. That is the practical direction suggested by OWASP Top 10 for Agentic Applications 2026 and MITRE ATLAS adversarial AI threat matrix.

Issue JIT credentials per task, not persistent tokens that survive beyond the session.
Bind each credential to a specific context scope, such as one dataset, one repo, or one ticket.
Use policy-as-code to evaluate tool requests at execution time, including sequence and destination risk.
Separate read, write, and export permissions so the agent cannot chain low-risk actions into high-impact ones.
Log the full execution chain, including prompts, tool invocations, outputs, and downstream effects.

This is where intent-based authorisation matters. Instead of asking only whether the agent has a role, ask what it is trying to do, what data it touched, and whether the action is consistent with the declared task. NHIMG has documented the operational consequences of weak control boundaries in OWASP NHI Top 10 and the broader credential exposure patterns in AI LLM hijack breach. The operational takeaway is simple: short-lived secrets reduce blast radius, while real-time policy reduces surprise.

These controls tend to break down when agents operate across many APIs with inconsistent logging, because the execution chain becomes impossible to reconstruct in time.

Common Variations and Edge Cases

Tighter control often increases friction, so organisations must balance speed against containment. There is no universal standard for this yet, especially for agents that need to move across SaaS tools, internal APIs, and code systems in one session. Best practice is evolving toward layered control: ZTA for network reach, PAM for escalation paths, and JIT secrets for each task, with human approval reserved for high-impact transitions rather than every step.

One common edge case is delegated automation. If the agent is acting on behalf of a user, RBAC alone may over-grant because it inherits the user’s broad entitlements without considering the agent’s autonomous behaviour. Another edge case is long-running agents that span multiple sessions. In those environments, static tokens become especially dangerous, and NHIMG’s reporting on secret exposure and agent key misuse, including the Moltbook AI agent keys breach, shows why rotation and revocation cannot be afterthoughts. For governance mapping, align the program with NIST Cybersecurity Framework 2.0 to anchor monitoring and response, then use Ultimate Guide to NHIs — Regulatory and Audit Perspectives for audit evidence expectations.

The hardest environments are multi-agent systems with shared memory and broad connector access, because one agent’s benign step can become another agent’s privilege escalation path.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agent tool chaining and runtime authorization are core OWASP agentic risks.
CSA MAESTRO		MAESTRO focuses on threat modeling autonomous agent workflows and tool use.
NIST AI RMF		AI RMF governs accountability and lifecycle risk for autonomous AI systems.

Assign ownership for agent behaviour and require runtime monitoring, logging, and escalation controls.

How should security teams govern AI agents that can invoke multiple tools in one session?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group