TL;DR: Enterprises moving AI into production face a familiar but faster cost-control problem, with Mavvrik reporting that 84% of companies see more than a 6% gross-margin hit from AI costs and nearly one in four see 16% or more. The governance issue is not just pricing, but attribution: teams cannot control what they cannot measure.
At a glance
What this is: This is a Kong analysis of LLM showback and chargeback, showing that token-level visibility and real-time cost attribution are becoming core AI governance capabilities.
Why it matters: It matters because the same identity and governance patterns that support access accountability in IAM now need to extend to AI usage, cost ownership, and enforcement across teams and services.
By the numbers:
- 84% of companies report more than a 6% hit to gross margin from AI costs.
- Nearly one in four reports erosion of 16% or more.
👉 Read Kong's analysis of LLM showback and chargeback for AI cost governance
Context
LLM cost governance is the practice of making AI consumption visible, attributable, and enforceable before the invoice becomes the first control signal. The identity governance connection is straightforward: when a team, application, or service can consume model capacity without clear ownership, accountability breaks in the same way unmanaged access breaks IAM programmes.
Kong’s article frames a real operational gap that many enterprises are already hitting. AI usage is fragmenting across tools, providers, and environments, which means token consumption is often known only after the fact. That leaves FinOps, platform, and security teams trying to govern spending without the telemetry needed to distinguish experimentation from uncontrolled scale.
Key questions
Q: How should security teams implement AI showback in production environments?
A: Start by attributing each model call to a specific team, service, or workflow, then capture token counts and pricing in real time. Showback should be used to create visibility and behaviour change before any billing is enforced. If the organisation cannot trust the numbers, it is too early for chargeback.
Q: When does chargeback become more useful than showback for AI governance?
A: Chargeback becomes useful when attribution is stable, pricing rules are understood, and leaders need teams to feel the budget impact directly. If usage is still fragmented across tools or teams cannot agree on allocation logic, chargeback will create disputes before it creates control.
Q: What breaks when AI consumption is not metered at the platform layer?
A: Without platform-layer metering, organisations lose the ability to link cost to ownership, workflow, or service in real time. That makes budget control reactive and turns finance into the first detector of overspend. It also weakens accountability because no one can confidently explain who consumed what.
Q: Who should be accountable for AI overspend when multiple teams share the same model?
A: Accountability should follow the consuming team, product, or service, not the model provider. Shared infrastructure still needs a named operational owner for allocation, exceptions, and policy enforcement. If no one owns consumption, costs will remain diffuse and governance will stay informal.
Technical breakdown
Why token metering changes AI cost governance
Token metering turns LLM consumption into a measurable unit rather than an opaque side effect of API traffic. Instead of treating AI requests like ordinary calls, the platform must attach model, token, and pricing data to each transaction so spend can be traced to a team, service, or workflow. That is what makes showback possible and chargeback defensible. Without metering, organisations can only estimate cost after invoices arrive, which is too late for operational control.
Practical implication: establish token-level telemetry before attempting any meaningful AI budget governance.
Showback and chargeback are different controls
Showback provides visibility without financial consequence. Chargeback converts that visibility into accountable cost allocation, which changes behaviour because teams feel the budget impact directly. In practice, many organisations need both in sequence: showback to build trust in the numbers, then chargeback once allocation rules are stable. The article’s four-level model reflects this maturity path, from basic request counting to full invoicing. The key point is that the governance mechanism changes from observation to enforcement.
Practical implication: decide whether your current priority is behaviour change through visibility or enforcement through cost allocation.
Why entitlement enforcement belongs in the gateway
When AI consumption limits are enforced at the gateway, the control operates before overspend occurs rather than after finance reconciliation. That matters because AI workloads can scale faster than manual review cycles, and the point of control has to sit close to the request path. This is a familiar governance pattern in API security and PAM: the closer the enforcement is to execution, the more effective it is. For AI, this means metering and policy cannot be separate layers if real limits matter.
Practical implication: place AI consumption limits where requests are made, not in downstream reporting systems.
NHI Mgmt Group analysis
LLM cost governance is becoming an identity problem, not just a finance problem. The moment AI usage is attributed to a team, service, or workflow, the programme is doing a form of identity governance over machine-driven consumption. That is why showback and chargeback matter beyond FinOps. They define ownership for non-human usage in the same way IAM defines accountability for access. Practitioners should treat cost attribution as a governance primitive, not a reporting extra.
The hidden AI fragmentation tax is a control failure, not an efficiency trend. Disconnected tools and providers create multiple consumption surfaces with no shared cost model, which means enterprises lose the ability to compare or constrain AI workloads consistently. This is analogous to privilege sprawl in identity programmes: the problem is not that usage exists, but that the organisation cannot see or govern it as one system. The practical conclusion is that cost control must be designed into the AI platform layer.
Token-level visibility is the new prerequisite for accountable AI operations. If a platform cannot tie model usage to a specific owner and dollar value in real time, it cannot support meaningful chargeback or budget discipline. That makes metering the operational basis for policy, planning, and exception handling. In NHIMG terms, this is the point where governance moves from post-facto reporting to enforceable ownership.
AI unit economics will increasingly determine which workloads survive scale-up. The article shows that organisations can no longer assume AI experimentation is financially benign. Once usage becomes measurable, leaders can distinguish workflows that create value from those that only create spend. Practitioners should expect AI governance discussions to shift from model choice to operating model, especially where teams are scaling agents and token-heavy workflows.
Cost accountability and access accountability are converging. The same organisational instinct that demands a named owner for privileged access now extends to AI consumption. That convergence matters because autonomous and semi-autonomous systems can consume resources at machine speed. Teams that already manage identities, entitlements, and reviews are best placed to absorb AI cost governance without building a separate control culture from scratch.
From our research:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
- For a broader governance lens, see OWASP Agentic AI Top 10 for the risk categories that emerge when agent behaviour outruns policy.
What this signals
Cost attribution will increasingly merge with access governance. As AI usage grows, organisations will need the same kind of named ownership they already expect for privileged access, but applied to token consumption and workflow spend. The practical challenge is not just metering, it is deciding which operational team owns the budget boundary and who can override it when usage changes.
Showback is becoming the transition control for AI programmes. Teams that are still debating chargeback can use visibility to build a credible baseline, especially where AI adoption is spreading faster than financial governance. With 80% of organisations reporting agent scope overrun, the broader lesson is that ungoverned usage tends to expand faster than review cycles.
AI cost management will push platform teams closer to IAM and FinOps operating models. Once real-time metering and allocation exist, leaders can compare model consumption, enforce thresholds, and build exception handling into the request path. That makes AI governance less about retrospective reports and more about controllable operating boundaries.
For practitioners
- Map AI consumption to a named owner Assign every model, workflow, or agent to a business, product, or platform owner before usage scales. Ownership has to be stable enough to support budget review, exception handling, and escalation when spend drifts.
- Meter tokens at the request layer Capture model, token, and pricing data at the point where AI traffic enters the platform so the organisation can attribute cost in real time. Downstream reporting alone is too slow for active governance.
- Separate showback from chargeback decisions Use showback first to validate attribution and build trust with stakeholders, then move to chargeback once allocation rules and pricing overrides are stable. Do not force invoicing before the metering model is credible.
- Enforce usage thresholds in the gateway Set limits where AI requests are made so overrun prevention happens before spend is committed. Controls that sit only in finance dashboards are advisory, not preventive.
Key takeaways
- LLM showback and chargeback are governance controls for AI consumption, not accounting add-ons.
- The evidence points to a growing cost problem, but the deeper issue is that many organisations still lack real-time ownership and attribution.
- Enterprises should meter AI usage at the platform layer and reserve chargeback for the point when allocation rules are trustworthy.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | PR.AC-4 | AI usage attribution depends on controlling who can consume services and under what conditions. |
| OWASP Non-Human Identity Top 10 | NHI-04 | Token and API cost attribution relies on managing non-human identity consumption paths. |
| NIST Zero Trust (SP 800-207) | SC-7 | Gateway-layer enforcement mirrors zero trust control placement at the request path. |
Treat AI workflows as governed non-human identities and track their consumption as part of NHI oversight.
Key terms
- Showback: Showback is the practice of attributing technology consumption to the team or service that used it without billing that team directly. In AI programmes, it turns token and model usage into visible operational data, which helps leaders compare workloads and change behaviour before financial enforcement is introduced.
- Chargeback: Chargeback is the allocation of technology costs back to the business unit, product, or service that incurred them. For AI workloads, it becomes a governance control when pricing and attribution are reliable enough that cost responsibility can influence design, usage, and prioritisation.
- Token Metering: Token metering is the process of measuring AI consumption at the level of tokens, requests, and model pricing so cost can be calculated accurately. It is the technical foundation for showback, chargeback, and real-time budget enforcement in enterprise AI platforms.
- AI Fragmentation Tax: AI fragmentation tax is the hidden cost created when model usage spreads across disconnected tools, providers, and environments without unified visibility. It is not a formal accounting line, but a useful term for the spend inefficiency and governance loss that accumulate when AI consumption cannot be managed as one system.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by Kong: LLM Cost Management: How to Implement AI Showback and Chargeback. Read the original.
Published by the NHIMG editorial team on 2026-04-06.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org