Kernel telemetry and eBPF tracing still expose workload identity gaps

By NHI Mgmt Group Editorial TeamPublished 2025-07-21Domain: Workload IdentitySource: Riptides

TL;DR: Kernel tracepoints can be carried through eBPF, OpenTelemetry, and Prometheus to turn raw file-creation events into labeled metrics, according to Riptides. But the deeper lesson is that kernel-level telemetry only helps if identity and workload posture are already governed, and the real gap is not visibility alone, but whether workloads have disciplined, reviewable identity boundaries.

At a glance

What this is: This is a kernel telemetry walkthrough that shows how tracepoint events can be turned into Prometheus metrics through eBPF and OpenTelemetry.

Why it matters: It matters because observability pipelines increasingly sit beside workload identity, and IAM teams need to understand where telemetry stops and governance begins.

👉 Read Riptides' walkthrough of kernel tracepoints, eBPF, OTEL, and Prometheus

Context

Kernel telemetry is the collection of low-level system events before they are abstracted into application logs or metrics. In identity terms, that matters because workload behaviour, service posture, and inter-service activity can be observed at the point where machine identities actually act, not just where they are declared in a CMDB or cloud console.

The governance gap is that visibility does not equal control. A telemetry pipeline can show file creation, tracepoint activity, and metric export with precision, but it does not by itself define who or what should be able to create, read, or transform those events. For teams managing workload identity, the question is how observability data feeds enforcement and review.

Key questions

Q: How should security teams use kernel telemetry in workload identity programmes?

A: Use kernel telemetry as evidence, not as a control. It can show what a workload did at runtime, but it does not decide whether that behaviour was authorised, reviewed, or appropriately scoped. The strongest use case is linking telemetry to named workload identities, ownership records, and lifecycle processes so detection and governance stay connected.

Q: Why does observability not replace workload access governance?

A: Observability tells you that an event happened. It does not tell you whether the actor had the right privilege, whether the access should exist, or whether the workload is still in scope. Governance still has to define identity, entitlement, and offboarding decisions. Telemetry strengthens review, but it does not substitute for it.

Q: How do organisations know if telemetry is actually improving identity control?

A: Look for evidence that telemetry is driving decisions, not just dashboards. Good signals include faster investigation of workload behaviour, tighter scoping of service accounts, and removal of orphaned identities when workloads are retired. If monitoring is growing but identity reviews are unchanged, telemetry is not improving control.

Q: What is the difference between workload monitoring and workload identity governance?

A: Workload monitoring measures activity. Workload identity governance defines who or what may act, for how long, and under what lifecycle conditions. A team can have excellent kernel telemetry and still lack control over service account sprawl, privilege scope, or offboarding. Governance is the decision layer; monitoring is the evidence layer.

Technical breakdown

How kernel tracepoints become observable identity signals

A tracepoint is a kernel hook that emits an event when a specific system action occurs. In this pattern, the blog uses kprobes and kretprobes to attach to do_filp_open, then decides in the return handler whether a file creation occurred. That gives the pipeline a low-level signal tied to workload behaviour rather than to application intent. The key technical constraint is portability, because kernel internals and calling conventions vary by architecture and version. Practical implication: treat kernel telemetry as an evidence source that still needs explicit versioning, validation, and scope control.

Practical implication: version the kernel path you depend on and verify the tracepoint behaves consistently before using it for governance decisions.

eBPF ring buffers and user-space handoff

eBPF is the execution layer that moves kernel-collected events into user space without a heavyweight kernel module. Here, the ring buffer is the transport mechanism, carrying structured event data from the tracepoint into a Go process that reads and decodes each sample. That matters because the pipeline is not just monitoring, it is transforming kernel events into a usable identity and operations signal. The architecture depends on buffer sizing, event structure, and reader reliability, since dropped or malformed samples weaken the trustworthiness of downstream metrics. Practical implication: validate event integrity and buffering behaviour before you rely on any metric derived from kernel data.

Practical implication: confirm ring-buffer sizing, decode logic, and reader stability so your metrics do not hide lost or malformed events.

OpenTelemetry and Prometheus as enforcement-adjacent telemetry

OpenTelemetry standardises the export path, while Prometheus turns the event stream into counters and labels that operators can query. In this design, raw kernel activity becomes a named metric such as file_created_total with path and name labels, which is useful for visibility but also dangerous if teams confuse measurement with policy. Metrics can help identify patterns, but they do not establish workload authorization, provenance, or lifecycle ownership. For identity teams, this is where observability should inform posture management rather than replace it. Practical implication: connect telemetry outputs to workload identity inventories and access governance rather than leaving them as standalone monitoring artifacts.

Practical implication: tie telemetry outputs back to workload identity records so monitoring supports governance instead of sitting apart from it.

NHI Mgmt Group analysis

Kernel telemetry is becoming part of workload identity governance, not a separate discipline. The moment file creation, service posture, and inter-service communication are captured at kernel level, observability starts to overlap with machine identity control. That overlap matters because the same workload that emits telemetry is also the subject of access decisions, privilege scope, and lifecycle review. The implication is that identity and observability teams can no longer treat telemetry as downstream plumbing.

Visibility without identity context creates a false sense of control. A Prometheus counter can show that an event occurred, but it cannot explain whether the actor had appropriate standing access, whether the workload was approved, or whether the signal maps to a managed identity. This is where many programmes overestimate maturity: they can observe behaviour, yet still lack governance over the identity behind it. The practitioner conclusion is that observability must be joined to inventory and entitlement controls.

Workload identity becomes more governable when telemetry is tied to named execution paths. The article’s kernel-level approach demonstrates a useful pattern for surfacing specific activity, but the field-level lesson is broader. Identity teams need evidence of what workloads actually do at runtime, not only what access they were given at provisioning time. The practitioner conclusion is that runtime evidence should inform scoping, review, and offboarding decisions for non-human identities.

Zero Trust for workloads fails if telemetry is mistaken for policy enforcement. Zero Trust assumes continuous verification and explicit control boundaries, while telemetry only reports what happened after the fact. That distinction is central for NHI and workload identity programmes because detailed monitoring cannot compensate for overbroad credentials, unmanaged service accounts, or unreviewed access paths. The practitioner conclusion is to treat kernel telemetry as a control input, not as a substitute for control design.

From our research:
80% of identity breaches involved compromised non-human identities such as service accounts and API keys, according to Ultimate Guide to NHIs.
Only 5.7% of organisations have full visibility into their service accounts, which explains why runtime telemetry and identity inventory often diverge in practice.
For a broader control lens, see NIST Cybersecurity Framework 2.0 and map telemetry outputs to detect and respond functions.

What this signals

Telemetry will matter most where identity inventories are weakest. With only 5.7% of organisations reporting full visibility into their service accounts, according to the Ultimate Guide to NHIs, runtime evidence can help, but it cannot close ownership gaps on its own.

The practical shift is toward evidence-driven NHI governance, where file events, service posture, and workload activity are tied back to named identities and lifecycle records. That makes observability part of access review and offboarding, not a parallel monitoring stack.

Runtime evidence is becoming a control input for Zero Trust in machine environments. If telemetry cannot be reconciled with entitlement scope and workload ownership, it remains operational data rather than governance evidence. Teams should watch for that gap as they mature workload identity programmes.

For practitioners

Map kernel telemetry to workload identities Bind tracepoint output to a current inventory of service accounts, workloads, and owning teams so each metric can be traced back to an accountable identity.
Validate the event path before operational use Test architecture-specific handling, kernel-version assumptions, and ring-buffer behaviour so the telemetry you depend on is stable across the environments you run.
Separate observation from enforcement Use kernel metrics as evidence for review and detection, but keep authorization, lifecycle changes, and access decisions in governed identity controls such as policy and approvals.
Tie telemetry to offboarding and review workflows When a workload or service account is retired, ensure the telemetry source, exported metrics, and associated identity records are removed together to avoid orphaned monitoring paths.

Key takeaways

Kernel-level telemetry can expose workload behaviour with precision, but precision is not the same as governance.
The biggest identity risk is treating observability as proof of control when service accounts, workloads, and lifecycle ownership remain unresolved.
Practitioners should connect telemetry outputs to identity inventories, entitlement reviews, and offboarding workflows before using them for security decisions.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Kernel telemetry supports visibility into workload identities and runtime behaviour.
NIST Zero Trust (SP 800-207)	PR.AC-4	Telemetry informs continuous verification, but does not itself enforce access boundaries.
NIST CSF 2.0	DE.CM-1	Kernel event monitoring aligns with continuous security monitoring and detection.

Use runtime telemetry to support NHI inventory, then validate ownership and scope before trusting the data.

Key terms

Kernel Tracepoint: A kernel tracepoint is a predefined hook that emits an event when a specific operating system action occurs. It provides structured, low-level visibility into runtime behaviour without changing the kernel code path, which makes it useful for observability and detection engineering.
eBPF Ring Buffer: An eBPF ring buffer is a shared transport mechanism used to pass event data from the kernel to user space efficiently. It is central to high-throughput telemetry pipelines, but its value depends on correct sizing, stable readers, and reliable decoding of the samples it carries.
Workload Identity Governance: Workload identity governance is the discipline of deciding what machine identities may do, for how long, and under what ownership and lifecycle controls. It combines entitlement scoping, review, offboarding, and evidence from runtime behaviour so identity decisions reflect actual use, not just provisioning records.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Riptides: From Tracepoints to Prometheus, the journey of a kernel event to observability. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-21.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org