TL;DR: Cloudflare’s Code Mode cuts token usage by 32% for a simple task and 81% for a 31-event batch workflow by having agents generate code from MCP server schemas instead of calling tools directly, according to WorkOS. The efficiency gain matters because it shifts MCP design toward hybrid execution models where code generation becomes part of the control surface, not just the model output.
At a glance
What this is: Cloudflare Code Mode replaces direct MCP tool calls with generated code, and the key finding is that this reduces token usage sharply for both simple and complex agent tasks.
Why it matters: IAM and NHI teams should care because agent execution style changes what needs to be governed, from tool permissions to runtime boundaries and auditability.
By the numbers:
- Code Mode used 32% fewer tokens for the simple single-event task.
- Code Mode used 81% fewer tokens for the complex 31-event task.
👉 Read WorkOS's analysis of Cloudflare Code Mode and MCP efficiency
Context
MCP tool calling works when each action is narrow and discrete, but it becomes expensive when an agent must repeat the same pattern across dozens of steps. The governance question for identity teams is no longer only who can call a tool, but what kind of execution model an agent uses when it acts on those permissions. That matters for NHI governance, agentic AI control, and auditability across the identity stack.
Code generation changes the control boundary. Instead of a single tool call per action, the agent emits code that can loop, branch, and reuse runtime state inside a sandboxed worker. That creates a different identity problem from direct tool invocation because privilege, timing, and execution context are now bundled into one runtime path rather than exposed as separate calls.
Key questions
Q: How should security teams govern MCP agents that can switch between tool calls and generated code?
A: Security teams should treat tool calls and generated code as separate execution modes with different control requirements. Direct calls are easier to log and approve, while generated code can compress many actions into one runtime block. Governance should define when each mode is allowed, what permissions it may inherit, and what evidence must be retained for review.
Q: Why does code generation change the risk profile of MCP workflows?
A: Code generation changes the risk profile because it lets an agent loop, branch, and reuse state inside a sandbox instead of exposing every step as a discrete tool call. That improves efficiency, but it also hides composite behavior behind one runtime boundary. The result is less granular visibility unless the runtime is explicitly governed.
Q: What breaks when MCP governance only models tool permissions?
A: When governance only models tool permissions, it misses the authority created by generated code. A sandboxed script can call multiple APIs, reuse runtime functions, and chain actions in ways that a single tool-call policy does not describe. Teams then lose clarity on what was executed, why it was allowed, and how to certify it.
Q: How do IAM teams decide when to permit agent-generated code in production?
A: IAM teams should permit agent-generated code only when the workflow needs repetition, branching, or runtime computation that direct tool calling cannot handle well. The decision should depend on task scope, logging quality, sandbox isolation, and whether the workflow can be certified after execution. If those controls are weak, direct tool calls remain the safer model.
Technical breakdown
MCP tool calling vs code generation
Standard MCP tool calling is a request-response pattern. The agent asks for one action, the MCP server returns a result, and the model decides the next step. Code Mode shifts that work into generated code based on the MCP schema, which lets the runtime use loops, conditionals, and native functions such as date handling. That is why the same workflow can complete with fewer model turns and fewer token-consuming exchanges. The architectural difference is not cosmetic. It changes where state lives, how repetition is handled, and how much of the workflow is visible to the model versus the sandboxed executor.
Practical implication: map which agent tasks still need direct tool calls and which can safely move into sandboxed code execution.
Sandboxed Workers as the execution boundary
Cloudflare’s design executes generated code in a sandboxed Worker rather than letting the model call MCP tools directly. That means the model writes the logic, but the Worker enforces runtime isolation and performs the actual server calls. In identity terms, the Worker becomes the execution boundary where permissions, network access, and logging need to be anchored. This matters because the risk surface changes from individual tool invocations to code that can combine multiple calls inside one session. The security question is no longer just whether a tool is allowed, but whether the generated code can do more inside the sandbox than the task really requires.
Practical implication: treat the sandbox as a governed runtime with explicit logging, network constraints, and execution review.
Why hybrid MCP execution models are emerging
The article points toward a hybrid pattern where agents use direct tool calls for simple tasks and generated code for complex ones. That is a practical response to token economics, but it also introduces governance complexity because the same assistant may act through two different execution paths. One path is visible as discrete calls. The other is a code layer that can compress many actions into one runtime block. For identity and platform teams, that means policy, observability, and approvals must account for both the tool layer and the generated-code layer, or the harder path will become the default in production.
Practical implication: define when an agent may switch from tool calling to code generation and what controls must follow that switch.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- Salesloft OAuth token breach — hackers stole OAuth tokens to access Salesforce data via Salesloft.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Code-generated MCP execution creates an identity control boundary that tool calling never had. Direct tool invocation exposes each action as a separate authorisation event, but generated code collapses multiple actions into one sandboxed runtime. That shifts governance from per-call permissioning toward runtime boundary control, which is a different discipline entirely. The practitioner conclusion is that agent execution mode now belongs in identity design, not just in application architecture.
Token efficiency is becoming an identity design driver, not just an engineering metric. The 32% and 81% token reductions show why teams will be tempted to move complex workflows into generated code. That incentive can outrun governance if policy only models tool permissions and not execution shape. The practitioner conclusion is that cost optimisation now has direct implications for access control, logging, and reviewability.
Hybrid tool-plus-code patterns will force teams to define where MCP authority actually lives. A schema-defined tool call and a sandbox-executed code path may reach the same API, but they do not carry the same governance meaning. The field will need to distinguish between action permission, runtime permission, and code execution permission. The practitioner conclusion is that identity teams should stop treating all agent actions as the same kind of access.
Runtime governance gap: This article shows that the real gap is not just tool sprawl, but the absence of policy for agent-generated execution paths. When code generation becomes a first-class way to use MCP, the programme must govern what can run, where it can run, and how much authority it inherits. The practitioner conclusion is that runtime boundaries need to be explicit before scale makes them implicit.
Cloudflare's Code Mode demonstrates that MCP governance is shifting from discrete actions to composite execution. That shift does not remove the need for least privilege, but it changes how least privilege is expressed because a single generated script can exercise many permissions in sequence. The practitioner conclusion is that access governance must follow execution form, not just identity type.
From our research:
- 59.8% of organisations see value in a solution that simplifies non-human access management and introduces dynamic ephemeral credentials, according to The 2024 Non-Human Identity Security Report.
- 23.7% of organisations share secrets through insecure methods such as email or messaging applications, which shows how quickly non-human control failures become operational exposure.
- The next step is governance that distinguishes runtime execution from raw tool access, so teams can align policy with the actual identity behaviour they are running.
What this signals
Runtime execution is becoming the new governance unit for agentic systems. As more agent workflows move from discrete tool calls to generated code, identity programmes will need controls that follow execution form rather than only identity type. That means the next round of policy design should focus on sandbox boundaries, code review evidence, and revocation points that exist after the agent has already started acting.
The practical signal for teams is that token efficiency will keep pushing agent builders toward composite execution paths. If governance cannot observe and certify what happened inside the sandbox, the organisation will have a cost-saving pattern that is harder to audit than the tool-calling model it replaced.
With 59.8% of organisations already seeing value in dynamic ephemeral credentials, per The 2024 Non-Human Identity Security Report, the market is already signalling demand for shorter-lived, task-scoped authority. For MCP programmes, that pressure should extend to code-execution permissions as well as API tokens.
For practitioners
- Classify agent execution modes separately Inventory which MCP workflows use direct tool calls and which use generated code inside a sandbox. Assign different approval, logging, and review requirements to each path so the same agent does not inherit one policy by default.
- Restrict code-generation authority to bounded tasks Allow generated code only for workflows where loops, conditionals, or repeated calls are necessary and where the allowed APIs are tightly scoped. Keep simple actions on direct tool calls so execution stays visible and easier to certify.
- Instrument the sandbox as a governed runtime Capture execution IDs, code payloads, outbound calls, and completion status from the Worker layer. The audit trail should show what logic ran, what resources it touched, and whether the session stayed within the intended task boundary.
- Review privilege inheritance across both paths Check whether permissions granted to the MCP tool layer are being reused implicitly by generated code. The policy model should state which permissions apply only to direct calls and which may be inherited inside the sandbox.
Key takeaways
- Code Mode changes MCP governance by moving authority from discrete tool calls into sandboxed code execution.
- The token savings are real, but they also increase the need for runtime boundaries, logging, and policy separation.
- IAM teams should govern execution mode explicitly, or optimisation pressure will outpace control design.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agent-generated code and tool use fit agentic AI runtime risk and authorization boundaries. | |
| OWASP Non-Human Identity Top 10 | NHI-01 | MCP agents act as non-human identities with scoped access to tools and data. |
| NIST CSF 2.0 | PR.AC-4 | Least-privilege access management applies to both tool calls and sandboxed code execution. |
Align agent permissions to least privilege and verify access boundaries during routine access reviews.
Key terms
- MCP tool calling: A request-response pattern where an agent invokes a defined tool, receives a result, and decides the next step. In practice, this makes each action visible as a separate interaction, which helps with logging but can become costly when a task requires many repeated calls.
- Generated code execution: A runtime model where the agent writes code from a tool schema and the environment executes it inside a sandbox. This compresses repeated actions into one execution block, which improves efficiency but requires stronger governance around what the code can do once it starts running.
- Sandboxed Worker: An isolated execution environment used to run generated code with bounded permissions and controlled network access. For identity teams, the sandbox is the place where agent authority becomes operational, so it must be treated as a governed runtime, not just an implementation detail.
- Runtime boundary: The point at which an identity's permissions, execution context, and observable behaviour are contained for policy purposes. In agentic systems, runtime boundaries matter because they determine whether the organisation can explain, audit, and limit what the agent did after execution begins.
Deepen your knowledge
MCP execution models and non-human access governance are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are defining policy for agent-driven workflows, it is a strong fit for your programme.
This post draws on content published by WorkOS: Cloudflare: Code Mode Cuts Token Usage by 81%. Read the original.
Published by the NHIMG editorial team on 2025-12-11.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org