Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Kubernetes health checks: where reliability and observability break down


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 9079
Topic starter  

TL;DR: Kubernetes health checks use startup, readiness, and liveness probes to manage containers, but Pomerium argues that eventual consistency, stateful dependencies, and split-mode deployments make those signals harder to trust in practice, according to Pomerium. The governance lesson is that health checks are only as reliable as the assumptions behind them, especially when access, proxying, and observability overlap.

NHIMG editorial — based on content published by Pomerium: 7 Things to Know About Kubernetes Health Checks

Questions worth separating out

Q: How should teams design Kubernetes health checks for stateful services?

A: Teams should map probes to the service state they actually govern, not to a generic application heartbeat.

Q: Why do Kubernetes health checks fail in complex deployments?

A: They fail when the probe is too narrow for the real dependency chain.

Q: How can operators tell whether a health check is actually useful?

A: A useful health check predicts operator action.

Practitioner guidance

  • Define probe scope by subsystem Map startup, readiness, and liveness checks to the exact subsystem they represent, such as authentication, authorization, or proxying, so each probe answers one operational question only.
  • Instrument split-mode dependencies Add explicit checks for cache invalidation, cross-component synchronisation, and any readiness gate that can block traffic until the service is genuinely usable.
  • Pair probes with traces and logs Use metrics, logs, and traces to separate a local service fault from an upstream dependency issue before you automate restarts or traffic removal.

What's in the full article

Pomerium's full blog post covers the operational detail this post intentionally leaves for the source:

  • The team’s exact reasoning for using startup, readiness, and liveness probes in different service states.
  • How Pomerium applies health checks across authentication, authorization, and envoy proxying components.
  • The split-mode and cache invalidation scenarios that make readiness harder to model in practice.
  • The observability guidance behind metrics, logs, traces, and OpenTelemetry for diagnosing unhealthy states.

👉 Read Pomerium's analysis of Kubernetes health checks and service readiness →

Kubernetes health checks: where reliability and observability break down?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 2 months ago
Posts: 8508
 

Health checks are a control boundary, not just an uptime feature. Kubernetes probes decide whether a workload should accept traffic, restart, or stay in service, so they function as an access and resilience control as much as an operational one. When the probe model is too shallow for the system state, governance decisions are made on partial information. Practitioners should treat probe design as part of service assurance, not an implementation detail.

A few things that frame the scale:

  • 91.6% of secrets remain valid five days after the targeted organisation is notified, showing a critical gap in remediation procedures, according to Ultimate Guide to NHIs.
  • Only 5.7% of organisations have full visibility into their service accounts, which is why probe reliability and dependency mapping cannot be treated as separate operational concerns.

A question worth separating out:

Q: What should security teams do when readiness signals are unreliable?

A: They should treat readiness as an access decision and verify whether the signal is trustworthy enough to gate traffic. When readiness is unreliable, the safer approach is to add richer diagnostic signals and reduce the assumption that a single green check means the workload is fit to receive requests.

👉 Read our full editorial: Kubernetes health checks expose the limits of eventual consistency



   
ReplyQuote
Share: