Start with the visible model charge, then add the cost of retrieval, routing, review, remediation, and any compliance work created by the query. A useful measure allocates those costs to the specific workflow, user, or agent that generated them. Without that attribution, finance sees volume but not the real unit economics.
Why This Matters for Security Teams
True per-query cost is a governance question, not just a finance exercise. If teams only track the model invoice, they miss the operational cost of retrieval, tool use, human review, exception handling, and the security work triggered by sensitive outputs. That gap becomes more serious when AI is wired into workflows that touch secrets, customer data, or regulated content, because each query can create downstream labor and risk.
For security leaders, the issue is similar to what NHI programs already see in practice: the visible credential or service charge is rarely the full economic footprint. NHI-related incidents often drive remediation, rotation, and review costs that do not appear on the original bill. The same pattern shows up in AI operations, especially where prompts can expose secrets or generate risky code. NHIMG research on the State of Secrets in AppSec notes that organisations spend an average of 32.4% of security budgets on secrets management and code security, which helps explain why hidden workflow costs matter so much.
In practice, many security teams discover the real unit cost only after usage spikes have already created review backlog, remediation work, and unplanned compliance effort.
How It Works in Practice
The cleanest approach is to allocate costs to the workflow, user, or AI agent that generated the query. Start with the direct model charge, then add the surrounding costs that make the query safe and usable. That usually includes retrieval infrastructure, routing logic, guardrails, logging, redaction, human review, incident response, and any policy work required for regulated data. This is consistent with cost-accounting thinking in NIST Cybersecurity Framework 2.0, where protection and governance activities are part of the operational picture, not an afterthought.
A practical cost model often separates usage into four buckets:
- Inference: the model provider charge for prompts, completions, and tokens.
- Enablement: retrieval, vector search, orchestration, and tool execution.
- Assurance: moderation, review, audit logging, and policy enforcement.
- Recovery: remediation, ticket handling, rework, and compliance follow-up.
For agentic systems, attribution should follow the agent or workflow identity rather than a shared application label. That matters because one autonomous agent can fan out into many tool calls, chain actions, and create review load far beyond the initial prompt. If the organisation uses secrets or credentials inside the workflow, track those control costs separately as well. NHIMG’s DeepSeek breach coverage is a reminder that AI-related exposure can create hidden cleanup work long after the query is complete.
Current guidance suggests that teams should calculate cost per query at the workflow level first, then roll up by department, customer, or product line. These controls tend to break down in shared, multi-tenant agent platforms because attribution becomes ambiguous when several services contribute to one outcome.
Common Variations and Edge Cases
Tighter allocation often increases reporting overhead, so organisations have to balance cost precision against the time required to instrument every path. That tradeoff is especially real when teams mix chat interfaces, background agents, and human-in-the-loop review in the same platform.
One common edge case is a query that looks inexpensive but triggers expensive follow-up work. A short prompt can cause retrieval across multiple systems, expose sensitive material, or require manual approval before a response is released. Another is batch processing, where one user action creates many model calls and makes per-query averages misleading. In those cases, cost per workflow is often more useful than cost per individual prompt.
There is no universal standard for this yet, but best practice is evolving toward chargeback models that include security and compliance overhead. NHIMG’s State of Secrets in AppSec research shows how remediation and budget fragmentation can distort operational decisions, which is exactly why AI teams should avoid treating model usage as the full cost. Where the environment includes autonomous agents, shared credentials, or regulated outputs, cost attribution becomes less reliable unless every step is tied to a durable workload identity and a clear owner.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A03 | Agentic systems create hidden downstream costs from unsafe tool use and review. |
| CSA MAESTRO | GOV-02 | MAESTRO governance covers accountability and operating cost for agentic AI workloads. |
| NIST AI RMF | AI RMF treats measurement and governance as part of managing AI operational impact. |
Measure full lifecycle AI costs, including governance, so business decisions reflect real risk and effort.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on July 1, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org