Subscribe to the Non-Human & AI Identity Journal

When does centralised governance fail in an open lakehouse?

It fails when policy is centralised but enforcement is not. In that case, the organisation can document controls in one system while access, definitions and transformations drift across clouds and tools. The warning sign is inconsistency between what the catalog says and what users actually see in production.

Why Centralised Governance Fails in an Open Lakehouse

Centralised governance breaks when it assumes the catalog is the control plane. In an open lakehouse, data can be queried, copied, transformed, and exposed across engines, pipelines, and clouds that do not all honour the same policy layer. That creates a gap between intended control and actual enforcement, which is exactly the kind of drift highlighted in Top 10 NHI Issues and the governance expectations in the NIST Cybersecurity Framework 2.0.

The failure is not that central policy is useless. The failure is that a single policy source cannot compensate for inconsistent enforcement across Spark jobs, object stores, BI tools, notebooks, and external sharing paths. Once access decisions and data transformations are distributed, teams often discover that the documented model and the operational model are no longer the same. In practice, many security teams encounter this only after a production query exposes data that the catalog still marks as restricted.

How Governance Drift Happens Across Tools and Clouds

Open lakehouse architectures are designed for portability, which is useful for analytics but difficult for governance. The same dataset may be read by one engine, transformed by another, and published through a third system that applies its own permissions, masking rules, or lineage metadata. If policy is written centrally but enforcement remains local, the control plane becomes informational rather than authoritative.

Operationally, effective governance depends on synchronising three layers:

  • Identity and access, so that users and services are authorised consistently across query engines and pipelines.
  • Metadata and classification, so that the catalog reflects current business context and sensitivity labels.
  • Runtime enforcement, so that row filters, masking, and write controls are applied where the data is actually accessed.

This is why the Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs is relevant even in a data governance discussion: the same control problem appears whenever non-human identities, service principals, or automation accounts can move data faster than review cycles can track it. Current guidance suggests using Ultimate Guide to NHIs — Regulatory and Audit Perspectives to align policy ownership with audit evidence, but the enforcement layer still has to live inside the systems that touch the data.

Common implementation patterns include policy-as-code, federated identity, consistent tagging, and automated checks for permission drift between the catalog and the storage or compute layer. These controls tend to break down when multiple clouds use different permission semantics and one engine can bypass the catalog entirely through direct object access.

Where the Model Breaks and What to Watch For

Tighter central governance often increases administrative overhead, requiring organisations to balance consistency against platform flexibility. That tradeoff is real, especially in environments where analytics teams move quickly and platform teams cannot manually approve every change.

There is no universal standard for this yet, but best practice is evolving toward governance that is centrally defined and locally enforced. That means treating the catalog as a source of intent, not proof of control. The warning signs are usually operational: orphaned permissions, conflicting table definitions, unmanaged copies in downstream workspaces, and access decisions that differ by engine.

This is also where manual exception handling becomes dangerous. A temporary bypass for a data scientist, a one-off replication to another cloud, or a transformation job running under a broad service account can quietly create a new access path that central policy never sees. Security teams should look for evidence that policy changes propagate automatically, that non-human identities are scoped to specific pipelines, and that audit trails tie every read and write back to an enforceable control point. The practical lesson is simple: central governance fails fastest when the organisation trusts the catalog more than the execution path.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
NIST CSF 2.0 PR.AC-4 Access permissions must stay consistent across lakehouse engines and clouds.
OWASP Non-Human Identity Top 10 NHI-01 Service accounts and automation identities often drive lakehouse access drift.
NIST AI RMF Governance relies on clear accountability for automated data actions and decisions.

Inventory non-human identities and bind each to least-privilege data paths with reviewable ownership.