TL;DR: Enterprise AI bills become volatile when prompt, response, retrieval, and background agent loops are not tracked at event level, according to Cranium, because one unbounded loop can rapidly drive spend upward. That makes token visibility and policy enforcement an AI governance problem, not just a finance issue.
NHIMG editorial — based on content published by Cranium: Tokenomics or Token-Chaos? How to Tame Your AI Spend
Questions worth separating out
Q: How should organisations control runaway AI token spend?
A: Start by attributing every model call to a specific identity, session, or application, then enforce quotas and rate limits at the gateway.
Q: Why does AI usage create governance problems for IAM teams?
A: Because AI usage is now tied to identities, sessions, and workflows that can consume resources continuously and automatically.
Q: What do security teams get wrong about AI cost control?
A: They often treat cost as a finance-only issue and overlook the identity layer that drives usage.
Practitioner guidance
- Map token usage to identity and session context Attribute every model call to a developer, application, business unit, or customer session so finance and security can see which identity path is driving consumption.
- Enforce request-level quotas at the gateway Apply daily token quotas, rate limits, and environment-specific caps at the API gateway so dev and test workloads cannot create uncontrolled spend.
- Cap conversation state and retry behaviour Limit historical context, stop infinite chat growth, and define retry thresholds for assistant flows that can repeatedly regenerate outputs or re-call tools.
What's in the full article
Cranium's full blog post covers the operational detail this post intentionally leaves for the source:
- Exact token visibility dimensions, including input versus output split and context caching effects.
- Examples of model tiering choices for different task types and cost profiles.
- Policy ideas for hard caps, daily quotas, and context limits in development and test environments.
- How a central control plane can discover Shadow AI and enforce guardrails across multiple model providers.
👉 Read Cranium's analysis of AI token spend, visibility, and guardrails →
AI tokenomics and runaway agent loops: what IAM teams need?
Explore further
Tokenomics is becoming an identity governance issue, not a billing issue. Once model usage is metered per prompt, completion, retrieval, and background loop, the identity behind the request becomes part of the cost model. That changes governance because access, session behaviour, and tool invocation now influence financial exposure as directly as they influence data exposure. Practitioners need to treat AI spend as a policy surface, not just an invoice.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
A question worth separating out:
Q: How do you know if AI token governance is actually working?
A: You should be able to trace each major cost spike to a specific workload, identity, or workflow within minutes, not days. If billing data is still pooled into a single bucket, or if runaway loops can continue without a policy trigger, governance is not functioning as designed.
👉 Read our full editorial: AI token spend needs governance before runaway agent loops do