Rethinking workload identity policy evaluation from kernel to user space

By NHI Mgmt Group Editorial TeamPublished 2025-10-06Domain: Workload IdentitySource: Riptides

TL;DR: Latency gains from moving OPA policy evaluation from kernel-space WASM to a user-space agent can be outweighed by memory fragility, debugging opacity, and maintenance drift, according to Riptides’ October 2025 engineering write-up. The deeper lesson is that workload identity enforcement works best when policy logic stays simpler than the kernel it protects.

At a glance

What this is: This is a field report on why kernel-space OPA policy evaluation for workload identity gave way to a user-space architecture, and the key finding is that stability and operability beat theoretical latency gains.

Why it matters: It matters because workload identity programmes live or die on policy reliability, cache behaviour, and auditability, not just socket-level speed, and the same design trade-offs show up in broader NHI and IAM governance.

By the numbers:

Only 5.7% of organisations have full visibility into their service accounts.
97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface.
73% of vaults are misconfigured, leading to unauthorised access and exposure of sensitive data.

👉 Read Riptides' account of moving workload identity policy evaluation to user space

Context

Workload identity policy evaluation is the control layer that decides whether a socket connection, service-to-service request, or process context should be allowed. In this article, the core question is where policy logic belongs when the enforcement point is close to the kernel but the policy engine is too complex to keep there.

The article’s real subject is not WASM or protobuf as technologies. It is the governance boundary between fast enforcement and maintainable identity policy, which is a familiar problem in NHI programmes that rely on SPIFFE, service accounts, and policy-driven access decisions.

That tension is typical of modern workload identity programmes: the closer policy gets to the kernel, the more expensive mistakes become. NHIMG’s view is that operational resilience is part of identity control, not an afterthought.

Key questions

Q: How should teams decide whether policy evaluation belongs in kernel space or user space?

A: Teams should place policy evaluation where failure is easiest to contain and operate. Kernel space only makes sense when the policy logic is simple, deterministic, and safe under severe runtime constraints. If the control needs rich debugging, memory flexibility, or frequent change, user space is the safer governance choice.

Q: When does kernel-level workload identity enforcement become too risky?

A: It becomes too risky when a policy bug, runtime defect, or memory leak can destabilise the host rather than just deny a request. If the enforcement path increases blast radius more than it reduces latency, the architecture is failing the identity programme rather than helping it.

Q: What do security teams get wrong about low-latency identity controls?

A: They often treat lower latency as proof of better control, even when the architecture adds operational fragility. In identity systems, observability, rollback, and safe failure matter as much as response time. A fast control that cannot be debugged or recovered cleanly is usually the wrong control.

Q: What is the difference between kernel caching and full policy execution in user space?

A: Kernel caching stores prior decisions close to the enforcement point, while full policy execution evaluates the request with the richer logic and tooling available in user space. The split lets teams keep the fast path in the kernel without forcing the policy engine itself into a brittle runtime.

Technical breakdown

Kernel-space policy evaluation and why it becomes fragile

Kernel-space policy evaluation tries to eliminate context switches by running authorization logic where packets and sockets are already visible. In practice, that shifts policy code into a constrained execution environment with tight memory rules, limited debugging, and a much higher blast radius when something fails. The article shows that WASM’s sandbox model does not remove kernel risk when the runtime, memory allocator, and policy modules all become part of the trusted computing base. A policy engine that is safe in user space can become unstable in kernel space because failure modes are no longer isolated to one process.

Practical implication: keep complex policy logic out of kernel space unless you can tolerate kernel-level failure impact.

OPA, WASM, and the limits of portability

OPA can compile Rego into WASM, but portability is not the same as executability in every runtime. The article highlights a key incompatibility: floating-point operations, JSON parsing behaviour, and memory assumptions that are acceptable in general-purpose environments can fail in a kernel context. That makes the policy artifact only partially portable, because the runtime environment constrains what the compiled policy can safely do. For workload identity teams, this is a reminder that policy compilation targets need to be validated against the actual execution boundary, not just against the language toolchain.

Practical implication: validate policy logic against the real runtime boundary before making architecture decisions.

User-space enforcement with kernel caching

The pivot to user-space moved decision logic into a Go agent while the kernel kept a cache and a deny-by-default fallback. This is a classic split between enforcement and evaluation: the kernel remains the fast path for repeated decisions, while the agent handles richer policy reasoning, error handling, and observability. The result is lower complexity at the enforcement edge and better operational control over policy updates. For identity teams, the important lesson is that cache design can preserve most of the performance benefit without forcing the full policy engine into the kernel.

Practical implication: use kernel-side caching and user-space evaluation when you need speed without absorbing kernel complexity.

NHI Mgmt Group analysis

Kernel-level identity policy is a governance boundary, not just an implementation choice. The article shows that moving OPA into the kernel changed the failure domain, the maintenance model, and the auditability of workload identity decisions. That means policy placement is itself an identity governance decision, because the control no longer fails like a normal application service. Practitioners should treat execution location as part of the control design, not a deployment detail.

Complexity debt accumulates faster when enforcement and interpretation share the same failure domain. The kernel WASM experiment bundled runtime, memory management, and policy semantics into one fragile stack. That created a situation where policy bugs, runtime bugs, and integration bugs were hard to separate, which is exactly the kind of ambiguity identity programmes try to avoid. The implication is that workload identity controls need a cleaner separation between policy evaluation and enforcement mechanics.

Identity blast radius is the right named concept for this architecture shift. The article demonstrates that a low-latency design can still be the wrong security design if its failure blast radius includes the kernel. Once policy execution can destabilise the core system, the control is no longer just authorising access, it is shaping system survivability. Practitioners should measure how much operational and security blast radius their identity enforcement path creates.

Workload identity maturity now depends on operability as much as on policy precision. The user-space model improved debugging, logging, and rollback behaviour, which are not cosmetic benefits in identity governance. They are the mechanisms that make policy review, incident triage, and change control possible at scale. The practical conclusion is that identity platforms should be judged on whether they make policy failures observable and containable, not only on whether they are fast.

SPIFFE-aligned workload identity still needs a sane enforcement architecture. The article sits inside the broader move toward structured workload identity, but it makes clear that standards do not rescue a brittle control plane. A strong identity primitive can still be undermined by an unstable runtime or a poor policy placement decision. The practitioner takeaway is to align workload identity standards with an architecture that keeps failure modes contained.

From our research:
Only 5.7% of organisations have full visibility into their service accounts, according to Ultimate Guide to NHIs.
97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface.
See also the Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs for how rotation and offboarding reduce operational drift.

What this signals

Identity blast radius: this article is a reminder that enforcement placement is part of governance, not just architecture. When policy evaluation moves closer to the kernel, failure containment becomes as important as decision correctness, and that same logic applies to service accounts, workload identity, and agentic systems that sit on critical paths.

The operational signal for practitioners is clear: policy systems that cannot be traced, rolled back, or safely failed are already too expensive to trust in production. With 5.7% of organisations having full visibility into their service accounts, identity teams should assume their weakest controls are the least observable ones.

Teams that are standardising on SPIFFE-style workload identity should pair that work with simpler enforcement architecture and stronger change control. The SPIFFE workload identity specification is useful, but the real programme question is whether policy decisions remain auditable when the runtime changes.

For practitioners

Map policy failure domains before choosing enforcement location Document which parts of the identity path can fail without taking down the kernel, the agent, or the policy service. If a policy bug can panic the host, the control is too close to the wrong trust boundary.
Separate policy evaluation from socket enforcement Keep the kernel responsible for fast-path enforcement and caching, while moving OPA evaluation, logging, and error handling into user space. That preserves most of the performance value without binding policy logic to kernel stability.
Test policies against the real runtime boundary Validate Rego output, JSON handling, and numeric operations in the environment where the policy will actually execute. A policy that compiles cleanly may still fail when runtime constraints remove features such as floating-point support.
Design deny-by-default fallback paths for control-plane failure If the agent cannot answer, the kernel should fall back to cached decisions or a conservative deny path rather than continuing with undefined behaviour. That keeps identity enforcement deterministic during outages.

Key takeaways

The main lesson is that policy placement changes the security properties of workload identity enforcement, not just its performance profile.
The production evidence in the article shows that kernel-level policy logic can trade microsecond latency for debugging pain, memory risk, and larger failure domains.
Practitioners should prefer architectures that preserve enforcement speed through caching while keeping complex policy logic in user space.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Policy execution location affects how NHI decisions are enforced and recovered.
NIST Zero Trust (SP 800-207)	PR.AC-4	Workload identity decisions map to least-privilege access enforcement.
NIST CSF 2.0	PR.PT-3	Protective technology must remain resilient and recoverable under failure.

Keep complex identity policy out of fragile runtimes and validate execution boundaries before production.

Key terms

Workload Identity: A workload identity is the machine or service identity used by software components to authenticate and authorise themselves. It typically includes certificates, tokens, or attested identities that let services prove who they are without relying on human credentials.
Policy Enforcement Point: A policy enforcement point is the component that applies access decisions at the moment a request is made. In workload identity systems, it can sit in the kernel, sidecar, proxy, or agent, and its placement determines latency, failure impact, and operational complexity.
Policy Evaluation: Policy evaluation is the process of interpreting rules against request context to produce an allow or deny decision. For workload identity, the critical question is not only whether the decision is correct, but whether the evaluation engine is safe, observable, and maintainable in the chosen runtime.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Riptides: Kernel From Kernel WASM to User-Space Policy Evaluation. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-10-06.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org