What Is High-cardinality Data? Definition & Examples

Expanded Definition

High-cardinality data is data with a very large number of distinct values in a field, tag, or label. In observability, that usually means metrics, logs, or traces that are segmented by identifiers such as pod IDs, request IDs, tenant IDs, service accounts, or API keys. The result is more precise filtering, but also higher storage cost, slower queries, and weaker signal reuse across events.

In NHI security, the term matters because access evidence is often tied to many unique entities and short-lived credentials. That creates a search problem: teams may know an API key was used, but still struggle to correlate which workload, rotation event, or privilege grant created the exposure. This is why cardinality should be treated as a governance issue, not just a telemetry tuning issue. Industry usage is still evolving, but the operational concern is consistent with NIST Cybersecurity Framework 2.0 guidance on visibility and risk management.

The most common misapplication is treating every unique NHI attribute as a useful dimension, which occurs when teams add high-entropy labels to dashboards without a retention or search strategy.

Examples and Use Cases

Implementing high-cardinality telemetry rigorously often introduces indexing and retention costs, requiring organisations to weigh investigative precision against query performance and platform spend.

An observability platform tags authentication events by service account, environment, cluster, and token hash, making incident search precise but expensive at scale.

A security team correlates API key usage with deployment version and workload identity to find which release introduced an unexpected secret exposure.

IAM logs include tenant ID and request ID for every request, allowing faster root-cause analysis but creating too many unique values for standard aggregation.

In an NHI review, analysts use cardinality-aware filters to identify which identities are active across multiple systems and which are effectively orphaned. NHI Mgmt Group notes that only 5.7% of organisations have full visibility into their service accounts in its Ultimate Guide to NHIs — Key Research and Survey Results, making search design especially important.

Identity engineers bound telemetry to a smaller set of stable tags, then keep high-entropy values in raw logs for targeted investigation. This approach aligns with NIST Cybersecurity Framework 2.0 principles for risk-based visibility.

In practice, teams often lower cardinality by normalising labels, hashing sensitive identifiers, or separating long-term analytics from incident-response logs. That keeps dashboards usable while preserving forensic depth when a key or workload must be traced.

Why It Matters in NHI Security

High-cardinality data can hide the exact evidence needed to detect compromised service accounts, overused tokens, or mis-scoped access. It becomes especially important when NHI inventories are large, because NHIs outnumber human identities by 25x to 50x in modern enterprises, as reported by NHI Mgmt Group in the Ultimate Guide to NHIs — Key Research and Survey Results. When every workload, key, and secret generates a distinct trail, defenders need telemetry models that support correlation rather than only collection.

Mismanaged cardinality also weakens governance. Teams may believe they have visibility because logs exist, but in reality the data is too fragmented to answer incident-scoped questions quickly. That delay matters when correlating access with secret rotation, offboarding, or privilege reduction. High-cardinality fields should therefore be designed around investigative use cases, not convenience for instrumentation alone.

Organisations typically encounter the operational cost of high-cardinality data only after an incident search times out or fails to correlate a compromised NHI, at which point the telemetry model itself becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	DE.AE-1	High-cardinality telemetry affects anomaly detection and event correlation across diverse identity signals.
OWASP Non-Human Identity Top 10	NHI-01	Visibility gaps in NHI inventories are worsened when telemetry is too cardinal to query effectively.
NIST Zero Trust (SP 800-207)	PA/continuous verification	Zero Trust depends on usable telemetry to continuously verify identities and access behavior.

Design telemetry so analysts can still detect and correlate suspicious NHI activity without overwhelming storage.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

High-cardinality Data

Expanded Definition

Examples and Use Cases

Why It Matters in NHI Security

Standards & Framework Alignment

Related resources from NHI Mgmt Group