What breaks when AI tool usage is measured only by uptime and latency?

What breaks is governance. Uptime and latency tell you whether the platform is available, but they do not show whether the right tools are being used, whether access is excessive, or whether adoption is drifting into shadow behaviour. Without usage telemetry, teams cannot make sound entitlement or audit decisions.

Why This Matters for Security Teams

Measuring AI tool usage only by uptime and latency answers an availability question, not a governance question. Security teams still need to know which models, tools, connectors, and secrets are being exercised, by whom or by what workflow, and whether that activity fits approved entitlements. Without that visibility, access reviews, audit evidence, and incident response all become guesswork.

This gap matters because AI systems can look healthy while quietly expanding their blast radius. A workflow may stay fast even as it starts calling new tools, retrieving data from new sources, or using credentials that were never meant for that purpose. That is why NHI Management Group treats telemetry as a control layer, not just an observability layer. The pattern is visible in the DeepSeek breach, where exposed data and credentials created far more risk than any performance dashboard would have revealed. Current guidance in the NIST Cybersecurity Framework 2.0 also points teams toward outcome-based control visibility rather than pure service health.

In practice, many security teams discover shadow tool use only after a review, a ticket, or an incident, rather than through intentional monitoring.

How It Works in Practice

Operationally, usage measurement should capture the action, the identity, the context, and the outcome. For AI tools, that means logging which agent, user, service, or workflow invoked which tool, against which data, with which credential or token, and under what policy decision. Uptime and latency can remain part of the SRE view, but governance needs separate telemetry that can support access certification, anomaly detection, and forensic reconstruction.

A practical control set often includes:

Tool invocation logs that record the requested action, not just the request duration.
Workload identity tied to the calling agent or service, so activity can be attributed to a specific NHI.
Secrets and token usage tracking, including rotation age and whether a credential is being used outside its expected scope.
Policy evaluation records showing why access was allowed or denied at request time.
Aggregation by model, connector, environment, and tenant so drift can be compared against approved baselines.

This approach aligns with NIST Cybersecurity Framework 2.0 because it supports measurable governance outcomes, not just platform stability. It also reflects the issues highlighted in NHIMG’s DeepSeek breach research, where exposed material and sensitive access paths mattered more than service performance alone. For secrets exposure patterns, NHIMG’s The State of Secrets in AppSec research shows why fragmented secret control and slow remediation make telemetry essential.

These controls tend to break down when AI access is routed through shared gateways or proxy layers because attribution to the real caller becomes ambiguous.

Common Variations and Edge Cases

Tighter telemetry often increases logging, storage, and review overhead, requiring organisations to balance governance value against operational cost. That tradeoff becomes more acute when teams have many short-lived agent sessions or high-volume tool calls, because raw event volume can overwhelm legacy SIEM pipelines.

There is no universal standard for this yet, but current guidance suggests separating service health metrics from governance telemetry. Latency still matters for user experience, yet it should not be mistaken for a control signal. A fast agent can still be over-privileged, be using stale secrets, or be calling disallowed tools at machine speed. Conversely, a slow workflow may be perfectly compliant.

Edge cases also appear in shared infrastructure, where multiple agents, tenants, or applications reuse the same backend service account. In those environments, uptime may look excellent while the underlying identity model is too coarse to support meaningful accountability. Best practice is evolving toward per-workflow or per-agent attribution, backed by policy decision logs and secret lifecycle telemetry. When that is not possible, teams should treat the measurement as incomplete rather than assume compliance.

For governance baselines and control mapping, The State of Secrets in AppSec is a useful reminder that fragmented secret estates often hide risk until much later than availability metrics would suggest.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 and CSA MAESTRO address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-02	Usage telemetry is needed to spot over-privileged or drifting non-human identities.
NIST CSF 2.0	DE.CM-1	Continuous monitoring must cover activity, not only service health and uptime.
CSA MAESTRO	GOV-03	Agent governance requires evidence of tool use and decision context.

Log NHI actions by identity and scope so entitlement drift can be reviewed and corrected.

What breaks when AI tool usage is measured only by uptime and latency?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group