Subscribe to the Non-Human & AI Identity Journal

How should organisations control AI costs in agentic environments?

Organisations should control AI costs by combining metering, attribution, and enforcement across the full request path. That means tracking model calls, tool usage, and data movement, then applying caps or routing rules when consumption exceeds policy. Without identity-linked attribution, cost control remains reactive and finance cannot trust the numbers.

Why This Matters for Security Teams

AI cost control in agentic environments is not just a finance problem. Autonomous agents can generate large volumes of model calls, chain tool invocations, and move data across services faster than traditional budget controls can react. That makes spend governance a security control as much as an accounting control, especially when agents have broad execution authority and can trigger downstream charges without a human in the loop.

Current guidance suggests treating cost as an identity-linked telemetry problem: the system must know which agent acted, what it tried to do, and which resources it touched. Without that linkage, teams cannot distinguish legitimate workload growth from misuse, runaway prompts, or compromised automation. The risk is amplified by documented agent overreach in the AI Agents: The New Attack Surface report, where 80% of organisations said their agents had already acted beyond intended scope.

That is why cost controls should be enforced at the same layer as access and policy, not left to monthly invoice review. Best practice is evolving, but the operational pattern is clear: attribute every significant request path, meter usage in real time, and constrain behaviour before spend becomes a breach or outage signal. In practice, many security teams discover runaway agent spend only after the invoice lands, rather than through intentional policy enforcement.

How It Works in Practice

Effective cost governance starts with a shared control plane for identity, policy, and metering. Each agent, workflow, and tool call should be tied to a workload identity, then evaluated at runtime before expensive actions are allowed. That means the organisation can apply different limits for summarisation, code execution, retrieval, or external API calls, instead of using one blunt budget for all activity. The strongest implementations combine policy-as-code with per-request telemetry so that the system can cap, deny, reroute, or degrade gracefully when thresholds are crossed.

A practical operating model usually includes:

  • Per-agent budgets that reset on a fixed window or per task.
  • Short-lived credentials and scoped tool tokens so an agent cannot keep spending after the job ends.
  • Usage attribution for model tokens, retrieval volume, tool calls, and data egress.
  • Policy thresholds that trigger fallback models, smaller context windows, or human approval.
  • Chargeback or showback mapped to the owning team, service, or business unit.

Standards-based guidance supports this direction. The NIST AI Risk Management Framework emphasises governance and measurement, while the OWASP Agentic AI Top 10 and the CSA MAESTRO agentic AI threat modeling framework both reinforce runtime controls over autonomous behaviour. NHIMG research on the Ultimate Guide to NHIs — Standards aligns cost governance with non-human identity discipline: if the system cannot attribute a call, it cannot govern the spend behind it.

These controls tend to break down when agents are allowed to spawn nested sub-agents or call external tools through opaque third-party connectors because attribution becomes fragmented across multiple billing and identity domains.

Common Variations and Edge Cases

Tighter cost controls often increase operational overhead, requiring organisations to balance cost predictability against developer velocity and agent autonomy. That tradeoff becomes visible in environments with many short-lived tasks, where aggressive metering can create noise or block legitimate bursty workloads. Current guidance suggests using tiered limits rather than one universal cap, especially for production agents that support customer workflows.

There is no universal standard for this yet, but several edge cases are consistent. Shared agents used by multiple teams need team-level attribution plus session-level tracing, otherwise chargeback becomes politically contested. Long-context workflows can produce sudden token spikes even when the agent is behaving correctly, so volume thresholds should be paired with semantic checks rather than raw token counts alone. For high-risk paths, the safest pattern is to route expensive actions through approval gates and reserve premium models for only the steps that truly need them.

Security teams should also watch for credential abuse that turns cost into an attack surface. NHIMG’s LLMjacking and AI LLM hijack breach coverage show how compromised identities can be used to drive unauthorised AI consumption at speed. The practical lesson is simple: if spend is not tied to identity, policy, and revocation, the organisation is paying for activity it cannot trust.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A06 Agentic abuse includes runaway tool use and unauthorised consumption.
CSA MAESTRO GOV-02 Governance needs attribution and policy for autonomous agent spend.
NIST AI RMF AI RMF supports measurement and governance for cost-related AI risk.

Meter agent actions per request and cap or deny expensive tool chains at runtime.