AI agent tool-calling shifts identity control to the execution layer

By NHI Mgmt Group Editorial TeamPublished 2025-12-03Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: AI agents calling external services need authentication flows that hide tokens from the model itself, and WorkOS’s source analysis says Arcade’s zero-token-exposure approach, OAuth 2.1 handling, and just-in-time authorization are aimed at that gap. The core issue is that access review and least-privilege assumptions break down when the identity that decides what to call is separate from the identity that stores the credential.

At a glance

What this is: This is an analysis of AI agent tool-calling security, and its key finding is that credential isolation and execution-time authorization are becoming the control plane for agent access.

Why it matters: It matters because IAM, PAM, and NHI programmes now have to govern not just who can log in, but which runtime is allowed to hold tokens, inject credentials, and execute tool calls on behalf of users.

By the numbers:

When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.

👉 Read WorkOS's analysis of Arcade for AI agent security and enterprise auth

Context

AI agent tool-calling is the problem of letting a runtime act on external services without turning the model into a secret holder. That matters for identity governance because the access decision, the credential, and the execution point are no longer the same place in the stack.

The security gap is not simply OAuth complexity. It is the assumption that a human-approved login flow can safely support an agent that may make dozens of independent API calls, each requiring scoped, auditable, and revocable access across multiple systems.

Key questions

Q: How should security teams govern AI agent tool calls without exposing credentials?

A: Security teams should place credentials in a separate execution layer, not in the model context, and bind every tool call to a distinct agent identity. The key controls are scoped consent, execution-time secret injection, and unified audit logging. That way the model can request actions without ever seeing the material that authorises them.

Q: Why do AI agents complicate least-privilege design?

A: AI agents complicate least-privilege design because the exact tool they will need is often decided at runtime, not at provisioning time. Static roles are therefore too coarse for many agent workflows. Teams need action-level scoping, short-lived authorisation, and revocation that applies to the execution layer, not just the user interface.

Q: What breaks when secret custody and model reasoning are in the same runtime?

A: When secret custody and model reasoning share the same runtime, the model becomes a potential path for secret exposure through prompts, logs, memory, or debug output. That undermines token hygiene and makes containment far harder after an incident. Separation is the control that keeps secrets out of the reasoning environment.

Q: Should organisations treat MCP as a security control or a transport standard?

A: Organisations should treat MCP as a transport standard unless the runtime is actually enforcing policy, scope, and audit. A plain MCP server that forwards requests still leaves the core identity problem unsolved. The control value comes only when the MCP layer mediates authorisation and credential release for each tool call.

Technical breakdown

Credential isolation for AI agent tool calls

Credential isolation means the AI model never receives OAuth tokens, API keys, or refresh secrets. Instead, a separate execution runtime stores credentials, validates policy, injects the right secret at call time, and returns only the result. That architecture reduces secret exposure in prompts, logs, and intermediate reasoning traces. It also changes the trust boundary: the agent can request an action, but it cannot directly hold the material needed to perform it. In practice, this is an NHI control plane problem, not a prompt engineering problem.

Practical implication: separate credential custody from the model runtime and require execution-layer mediation for every tool call.

OAuth 2.1 and just-in-time authorisation for agents

Agent tool-calling typically relies on OAuth 2.0 or OAuth 2.1 flows, but the important change is timing. A just-in-time model authorises access only when the agent actually needs a tool or scope, rather than pre-granting broad standing access. That reduces over-scoping, but it also creates a governance requirement: the approval event, the scope, and the execution outcome must all be bound together. For AI agents, token refresh and consent handling become lifecycle controls, not just authentication plumbing.

Practical implication: bind consent, scope, and execution record to the same audit trail so access can be reviewed at the action level.

MCP-native tool execution and policy enforcement

Model Context Protocol, or MCP, standardises how AI systems discover and call tools, but protocol standardisation does not equal security. An MCP-native runtime can centralise policy, expose pre-built tools, and mediate tool requests before credentials are released. That is useful because the attack surface shifts from the model prompt to the tool registry, server policy, and credential broker. The real governance question is whether the MCP layer is acting as an identity control point or merely a transport layer for calls that were already approved elsewhere.

Practical implication: treat the MCP runtime as an enforcement point and test whether it can deny, scope, and log tool use independently of the agent.

NHI Mgmt Group analysis

Zero-token exposure is a governance boundary, not a feature checkbox. The central problem in AI agent tool-calling is that credentials must be usable by runtime infrastructure but invisible to the model. That boundary matters because it prevents the agent from becoming a secret distribution surface through prompts, traces, or memory. For IAM and NHI teams, the question is whether the credential broker is the control plane or merely another place where access is cached.

Least privilege is harder to define when the actor is an agentic runtime. Traditional access models assume request intent is stable enough to scope at provisioning time, but agent workflows decide which tool to call only at execution time. That makes static entitlement design brittle and pushes governance toward action-level scoping. The implication is that provisioning logic alone no longer tells you what access will actually be used.

Tool-calling security exposes the split between application identity and delegated identity. WorkOS-style enterprise auth may still govern how a user enters the application, while Arcade-style controls govern what the agent may do outside it. Those are different control surfaces and they should not be collapsed into one policy conversation. Practitioners need to stop treating agent actions as a thin extension of user login and instead govern them as a separate execution identity.

MCP-native access broadens the attack surface from tokens to tool ecosystems. Once the runtime can connect Slack, Gmail, GitHub, Salesforce, and custom services, the security question becomes one of tool inventory, policy consistency, and revocation speed. That is the same identity sprawl problem seen in broader NHI programmes, only now the sprawl is mediated through agent behaviour. The practitioner conclusion is that tool registries need lifecycle governance, not just developer convenience.

Credential isolation for agents is becoming the new identity blast radius control. The more autonomous the workflow, the more the risk shifts from stolen passwords to over-broad delegated access and mis-scoped API calls. That makes runtime containment, not just SSO and MFA, the decisive governance layer for production AI agents. Teams should treat this as an NHI governance problem with agentic characteristics, not as a narrow integration pattern.

From our research:
Around 100,000 valid secrets were found in public Docker images, with ENV instructions alone accounting for 65% of all secret leaks in containers, according to The State of Secrets Sprawl 2025.
15% of commit authors have leaked at least one secret in their contribution history, according to The State of Secrets Sprawl 2025.
For teams that need a broader governance lens, Ultimate Guide to NHIs , Static vs Dynamic Secrets helps frame why runtime custody matters more than storage location alone.

What this signals

Zero-token exposure: the next governance debate will be whether the agent runtime itself is trusted enough to broker secrets safely. As agent adoption grows, identity teams will need to prove that credential custody, execution, and revocation are separated by design rather than by convention.

With 4.6% of all public GitHub repositories containing at least one hardcoded secret, per The State of Secrets Sprawl 2025, the broader message is clear: secret sprawl is already systemic, so agentic workflows cannot be allowed to inherit the same weakness.

That shifts programme planning toward runtime control points, especially where tool ecosystems span SaaS, developer platforms, and internal APIs. Teams that already use the Ultimate Guide to NHIs , Static vs Dynamic Secrets as a baseline should now extend the same logic to agent execution paths.

For practitioners

Define a separate agent execution identity Create an identity and access model for the agent runtime that is distinct from the end user and from the application service account. Bind every tool call to that execution identity so approvals, scopes, and audit trails remain attributable.
Broker credentials outside the model boundary Keep OAuth tokens, API keys, and refresh secrets in a dedicated runtime or vault layer that the model cannot read. Require the broker to inject credentials only at execution time and to deny any direct secret retrieval path.
Scope tools by action, not just by application Review whether each agent tool can be limited to the exact action it needs, such as read-only Slack access instead of full workspace messaging. Where scopes are broader than the task, redesign the workflow before expanding deployment.
Log consent, scope, and execution together Ensure the approval event, the granted scope, the tool invoked, and the response returned are captured in one auditable record. Without that linkage, you cannot distinguish legitimate delegated action from overreach in later review.
Test revocation at the tool layer Verify that revoking a tool grant actually blocks execution immediately across the MCP runtime, cached sessions, and any downstream worker processes. If revocation only removes the UI permission, the control is incomplete.

Key takeaways

AI agent tool-calling turns credential custody into an execution-layer governance problem, not just an auth integration detail.
The operational risk is not only token leakage but also over-scoped delegated access that grows with each new tool connection.
Practitioners should separate model reasoning from secret custody, then prove that consent, scope, and revocation are all enforced at runtime.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agent tool-use and credential isolation map directly to agentic AI threat controls.
OWASP Non-Human Identity Top 10	NHI-02	Credential custody and secret exposure are central NHI control concerns.
NIST Zero Trust (SP 800-207)	PR.AC-4	Zero-trust access should verify each delegated call rather than trust the session.

Store secrets outside model context and verify that no tool path exposes them directly.

Key terms

Tool-calling authentication: Tool-calling authentication is the control pattern that lets an AI agent invoke external services without directly holding the credentials that authorise those calls. The runtime brokers identity, injects secrets at execution time, and records the action so access can be governed after the fact.
Credential isolation: Credential isolation means separating secret storage and secret use from the AI model that decides what to do. In practice, the model can request an action, but a separate runtime or vault enforces whether credentials are released, which reduces exposure in prompts, logs, and agent memory.
MCP runtime: An MCP runtime is the execution layer that mediates how an AI system discovers and calls tools through the Model Context Protocol. When governed well, it becomes an identity control point for tool access, policy checks, and audit logging rather than a simple transport service.
Agent execution identity: An agent execution identity is the identity assigned to the runtime that performs actions on behalf of a user or application. It is not the same as the human user or the model, and it must be scoped, logged, and revocable as its own non-human identity.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by WorkOS: Arcade for AI Agent Security, features, pricing, and alternatives. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-12-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org