Subscribe to the Non-Human & AI Identity Journal
Architecture & Implementation Patterns

Liveness probe

← Back to Glossary
By NHI Mgmt Group Updated June 11, 2026 Domain: Architecture & Implementation Patterns

A liveness probe checks whether a process is still functioning and should be restarted if it is not. It is useful for crash detection, but it does not prove the service is ready to serve requests or that its internal state is consistent.

Expanded Definition

A liveness probe is an operational check that determines whether a workload or agent is still running and should be restarted if it has become unresponsive. In cloud-native and agentic environments, it is a runtime health signal, not an identity control, and it should be distinguished from readiness checks, authorization checks, or state validation. Definitions vary across vendors and orchestration platforms, but the core purpose remains crash detection and recovery orchestration.

In NHI and AI agent deployments, a liveness probe is useful when an automated process can hang, deadlock, or enter a failed loop while still appearing “up.” It helps orchestration systems recover a stuck service, yet it does not prove that credentials are valid, that secrets are protected, or that the process can safely continue acting on behalf of an identity. NIST’s NIST Cybersecurity Framework 2.0 frames this kind of resilience as part of operational recovery, but liveness alone is too narrow to represent trust.

The most common misapplication is treating a passing liveness probe as evidence that an NHI-backed service is safe to authorize, which occurs when teams confuse process uptime with identity integrity or data-plane readiness.

Examples and Use Cases

Implementing liveness probes rigorously often introduces restart sensitivity, requiring organisations to weigh faster failure recovery against the risk of interrupting a process that is slow but still healthy.

  • A containerized token-rotation worker is restarted if it stops responding, even though a separate readiness check is still needed before it resumes issuing or renewing credentials.
  • An AI agent running scheduled tool calls fails a probe after entering an infinite loop, so the orchestrator restarts it instead of allowing a stuck execution path to persist.
  • A service account-driven API gateway uses liveness checks to detect deadlocked processes, while secrets validation and access control are handled elsewhere in the stack.
  • An internal controller managing NHI lifecycle workflows is probed for heartbeat continuity, but policy enforcement and secret rotation remain independent controls.
  • The operational distinction matters when applying guidance from the Ultimate Guide to NHIs, especially where automated identities are expected to recover cleanly without exposing credentials or privilege drift.

In practice, liveness probes are often paired with external health indicators and workload-specific checks, because a process can be alive while still unable to serve traffic or safely execute privileged actions.

Why It Matters in NHI Security

Liveness probes matter because many NHI failures are operational failures first and security incidents second. A stuck controller, deadlocked agent, or hung secret-handling process can block rotation, delay revocation, or silently stall access decisions. That becomes especially important in environments where NHIs outnumber human identities by 25x to 50x and operational drift spreads quickly across services. NHI Mgmt Group’s Ultimate Guide to NHIs notes that 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, which is why resilience signals must be paired with identity governance.

A liveness probe should be understood as a narrow availability check inside a broader control set that includes secret hygiene, privilege boundaries, and trust validation. It does not confirm whether a restarted process should retain prior tokens, whether its cache is stale, or whether its workload has entered an unsafe state after partial failure. Organisations typically encounter the need for liveness tuning only after a hung agent, stalled rotation job, or frozen automation path prevents recovery, at which point the term becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
NIST CSF 2.0RC.RPLiveness probes support timely recovery from failed workloads and stuck automation.
OWASP Non-Human Identity Top 10Health checks are part of securing automated identities and their operating environment.
NIST Zero Trust (SP 800-207)PR.AC-1Zero Trust requires continuous validation beyond a process simply being alive.

Use liveness checks to restart failed NHI services quickly while preserving separate readiness and authorization controls.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on June 11, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org