Subscribe to the Non-Human & AI Identity Journal

Fail-closed runtime policy

Fail-closed runtime policy means an agent cannot continue acting when the policy decision service is unavailable or unreachable. The design is deliberate: if authorisation cannot be checked in real time, execution stops rather than defaulting to permit, which is the safer model for high-risk agent tool use.

Expanded Definition

Fail-closed runtime policy is the operational choice to stop an agent when its policy decision service cannot be reached, rather than letting the agent continue with stale or absent authorisation. In NHI and agentic AI systems, that distinction matters because tool use, secrets access, and downstream actions can create real-world impact within seconds. The policy gate is often implemented alongside NIST Cybersecurity Framework 2.0 concepts for protecting service access and availability, but no single standard yet fully defines fail-closed runtime behaviour for autonomous agents.

In practice, the term covers runtime authorisation checks, dependency health, and the decision to block execution when policy telemetry is missing. It is different from static least privilege because the policy must be evaluated continuously, not just assigned at provisioning time. NHI Management Group treats this as a runtime safety control, not merely an IAM preference, because a live agent may already hold valid credentials while still needing fresh decisioning before every sensitive action. This is especially relevant when an agent can reach privileged APIs, secrets managers, or deployment tooling, as highlighted in Top 10 NHI Issues. The most common misapplication is treating an unavailable policy engine as a harmless outage, which occurs when teams configure permissive fallback to preserve uptime.

Examples and Use Cases

Implementing fail-closed runtime policy rigorously often introduces availability dependency on the policy service, requiring organisations to weigh stronger control against the risk of halted automation.

  • An AI agent that can open pull requests is paused when its policy service times out, preventing an unreviewed code change.
  • A secrets rotation bot is blocked from retrieving production API keys until policy can confirm the request is still within approved scope.
  • A customer-support agent with tool access stops before exporting account data if the authorisation backend becomes unreachable.
  • A workflow running under a service account is suspended during a policy outage instead of defaulting to permit, reducing blast radius.
  • A delegated build agent resumes only after policy health returns, aligning runtime controls with the lifecycle discipline described in Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs.

These patterns are easiest to validate in environments that already instrument policy checks as a first-class control and compare outcomes against runtime identity telemetry. For architecture teams, a useful external reference point is the NIST Cybersecurity Framework 2.0, even though it does not prescribe this exact agent behaviour. Where the industry is still evolving, definitions vary across vendors on whether cached decisions count as fail-closed or merely degraded-mode access.

Why It Matters in NHI Security

Fail-open behaviour turns a temporary control outage into an authorisation bypass. In NHI environments, that can expose secrets, create unsanctioned tool execution, or allow an agent to continue acting after its trust context has expired. This is why the concept sits close to the core NHI concerns captured in Ultimate Guide to NHIs — Regulatory and Audit Perspectives: auditors want evidence that privileged automation fails safe, not forward.

The risk is not theoretical. NHIMG research on the LLMjacking attack pattern shows attackers attempt access to exposed AWS credentials within an average of 17 minutes, which means any permissive fallback during policy failure can be exploited quickly. The same exposure logic appears in the DeepSeek breach, where secret and data exposure amplified operational risk. Organisations typically encounter the consequence only after a policy outage coincides with an active compromise, at which point fail-closed runtime policy becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 Agent runtime safety guidance covers blocked execution when trust cannot be re-evaluated.
OWASP Non-Human Identity Top 10 NHI-04 Runtime authorisation failure handling is central to preventing unsafe NHI actions.
NIST Zero Trust (SP 800-207) SC-7 Zero Trust demands continuous verification before allowing access decisions.

Stop agent actions when policy checks fail and avoid permissive fallback on control-plane outages.