Subscribe to the Non-Human & AI Identity Journal

Why do identity maturity benchmarks often miss real risk?

They often measure whether a programme exists, not whether it is enforced across the identities that matter most. If the model omits service accounts, secrets, and workload credentials, it underestimates exposure and overstates governance confidence. That makes the benchmark useful for direction, but weak as an assurance measure.

Why This Matters for Security Teams

identity maturity benchmarks are useful for showing whether core capabilities exist, but they often miss whether those controls cover the identities that actually drive exposure. That is a problem because non-human identities, secrets, service accounts, and workload credentials usually outnumber human accounts and fail in different ways. NIST’s Cybersecurity Framework 2.0 is built around outcomes, yet many scorecards still reward policy presence over operational enforcement.

That gap is visible in current NHI research. NHIMG’s Ultimate Guide to NHIs — Key Research and Survey Results notes that only 5.7% of organisations have full visibility into their service accounts, while 96% store secrets outside secrets managers in vulnerable locations. A benchmark that does not include those assets can create a false sense of maturity, even when the most exploitable credentials are unmanaged.

In practice, many security teams encounter real NHI exposure only after a leak, lateral movement event, or incident review, rather than through intentional benchmark-driven validation.

How It Works in Practice

Most maturity models ask whether an organisation has policies, inventory processes, rotation standards, or access reviews. Those questions matter, but they do not prove that governance extends to ephemeral credentials, pipeline tokens, API keys, or machine-to-machine trust. A stronger assessment measures whether identity controls are enforced across the full credential lifecycle: issuance, use, rotation, revocation, and monitoring.

The operational test is simple. If a workload identity can access a production API, should that access be tied to a human-style role review, or to a runtime policy decision that reflects context, task, and risk? For many environments, static RBAC is too blunt. Current guidance suggests combining workload identity with just-in-time credentialing, short TTL secrets, and policy evaluation at request time. Standards such as SPIFFE support stronger workload identity primitives, while CISA Zero Trust Maturity Model reinforces the need to verify continuously rather than assume trust after initial authentication.

NHIMG’s Top 10 NHI Issues and Ultimate Guide to NHIs both show why this matters: unmanaged secrets and poor rotation are not edge cases, they are common control failures. A practical maturity model should therefore check for:

  • coverage of service accounts, bots, CI/CD tokens, and API keys, not just users
  • evidence of secrets discovery, classification, and rotation enforcement
  • task-based access review for non-human identities, not one-time approval
  • revocation speed after compromise, decommissioning, or pipeline teardown
  • telemetry that ties identity use to workload, environment, and business context

These controls tend to break down in highly dynamic hybrid and multi-cloud environments because identity inventory and enforcement drift faster than the benchmark can be updated.

Common Variations and Edge Cases

Tighter measurement often increases operational overhead, requiring organisations to balance richer assurance against the cost of continuous discovery and policy maintenance. That tradeoff is especially visible when identity systems span cloud services, legacy applications, and third-party automation. There is no universal standard for this yet, so benchmark design remains partly a governance choice rather than a settled technical rule.

One common edge case is that a team may have strong PAM and SSO maturity while leaving non-human identities outside the control boundary. Another is that secrets may be technically vaulted but still over-permissioned, long-lived, or embedded in automation that is rarely reviewed. In both cases, the benchmark can rate the programme highly while exposure remains high. The Ultimate Guide to NHIs — Why NHI Security Matters Now and NIST’s framework both point toward outcome-based assurance, but the industry still lacks a universal maturity standard for non-human identity enforcement.

That is why the best benchmark question is not “Do controls exist?” but “Are the controls applied to the identities that can actually move data, call tools, and create blast radius?” Benchmarks that ignore that distinction measure programme intent more than real risk.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Non-Human Identity Top 10 NHI-01 Benchmark gaps often come from missing NHI inventory and enforcement.
NIST CSF 2.0 GV.OC Maturity benchmarks should reflect actual organisational context and risk exposure.
NIST AI RMF GOVERN Risk benchmarks fail when governance does not cover how automated systems are actually used.

Inventory every non-human identity and verify controls cover secrets, service accounts, and workload credentials.