Subscribe to the Non-Human & AI Identity Journal

What do organisations get wrong about observability in microservices?

They often assume more data automatically means better insight. In practice, more tags, more logs, and more traces can increase cost and noise while hiding the identity relationships that matter. The better approach is to design the telemetry model around the questions investigators need to answer during an incident or access review.

Why This Matters for Security Teams

Microservices observability is often treated as a telemetry volume problem, but the real failure is identity blind spots. When service-to-service calls, API keys, and workload tokens are not captured as first-class signals, teams can see latency and errors without understanding which non-human identity moved, what it accessed, or whether the access was expected. That leaves incident response and access review incomplete.

This matters because NHIs are now a dominant control surface. NHI Mgmt Group notes in the Ultimate Guide to NHIs that NHIs outnumber human identities by 25x to 50x in modern enterprises, and only 5.7% of organisations have full visibility into their service accounts. More logs do not fix that gap if the telemetry cannot connect a request to a workload identity. The NIST Cybersecurity Framework 2.0 also reinforces that governance, asset visibility, and access control need to work together, not as separate exercises.

In practice, many security teams discover broken identity correlation only after a service account has been abused and the audit trail no longer answers who-what-when with confidence.

How It Works in Practice

Effective observability for microservices starts with identity, not dashboards. Each request should carry enough context to answer which workload initiated it, what credential or token was used, which policy allowed it, and which downstream services were touched. That usually means instrumenting service meshes, gateways, and application code to emit identity-linked telemetry alongside performance data.

Practitioners increasingly align this with workload identity patterns such as SPIFFE and SPIRE, or with short-lived OIDC-based tokens, so the observability stack can correlate a transaction to the workload that originated it. This is different from simply logging a username or IP address. For NHI-heavy environments, the useful question is not just “what happened?” but “which non-human identity did it, under what authority, and was that authority appropriate at the time?” The Ultimate Guide to NHIs is useful here because it frames visibility, rotation, and offboarding as linked controls rather than separate hygiene tasks.

  • Use stable workload identifiers in traces, logs, and metrics so one service instance can be followed across hops.
  • Tag authorization decisions, not just requests, so investigators can see why access was granted or denied.
  • Prefer short-lived credentials and tokens so telemetry reflects current authority, not stale privilege.
  • Normalize identity fields across platforms before shipping data into SIEM or data lake tooling.

The goal is to reduce noise while preserving the identity context that supports incident reconstruction and access review. These controls tend to break down in containerised environments with high pod churn because ephemeral workloads rotate faster than the telemetry pipeline can reliably enrich and correlate them.

Common Variations and Edge Cases

Tighter identity correlation often increases instrumentation overhead, requiring organisations to balance richer forensic value against performance, storage, and operational complexity. That tradeoff is real, especially in very high-throughput systems where over-tagging can become its own source of noise.

Best practice is evolving, and there is no universal standard for telemetry fields that every microservices stack must expose. Some environments can rely on service mesh telemetry, while others need application-level enrichment because proxies cannot see enough context. In regulated workloads, the bar is higher: logs may need to support both security investigation and retention requirements without exposing secrets.

Edge cases matter most when service identities are shared, when asynchronous jobs fan out across queues, or when third-party integrations inject requests through shared gateways. In those cases, observability fails if it treats all traffic as equivalent. A useful reference point is the broader governance gap described in NHI Mgmt Group research, where excessive privileges and weak offboarding are common failure modes, and where observability must support both detection and review rather than simple monitoring. Current guidance suggests that teams should log identity transitions and authorization context wherever workloads cross trust boundaries, even if full end-to-end correlation is not yet possible.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Non-Human Identity Top 10 NHI-01 Identity blind spots are a core non-human identity visibility issue.
NIST CSF 2.0 DE.CM-7 Continuous monitoring depends on meaningful telemetry, not raw volume.
NIST AI RMF Telemetry design should support governance, traceability, and accountability.

Tune observability to monitor identity-linked events that support detection and incident analysis.