TL;DR: Business context, semantics and governance are being brought directly into the lakehouse so users can trust data definitions, ownership and policy in place, while also improving semantic consistency for AI and analytics, according to Collibra. The real shift is that governed meaning becomes part of the operating model, not a separate review layer.
At a glance
What this is: Collibra’s expanded Dataplex partnership pushes business context and semantics into the data fabric to reduce fragmentation and improve trust.
Why it matters: For IAM, governance and security teams, the lesson is that trust breaks when context is detached from the system where data and AI decisions are made.
👉 Read Collibra's analysis of Dataplex business context and semantic governance
Context
Data governance fails when technical metadata, business meaning and policy live in separate systems. In practice, that fragmentation forces analysts, data scientists and platform teams to make decisions without a stable answer to basic questions such as ownership, field meaning and trust status. For identity and governance programmes, the pattern is familiar: access is easier to grant than context is to preserve.
This article is about collapsing that gap inside the lakehouse by making governance metadata and semantic meaning available where the data is used. That matters to IAM and security leaders because the same control problem shows up across NHI, autonomous systems and human users: if the authoritative record and the operational environment drift apart, policy becomes harder to enforce and easier to misinterpret.
Key questions
Q: How should teams govern data context in a hybrid lakehouse environment?
A: Teams should treat business context as an enforceable control, not a reference layer. The practical test is whether ownership, meaning and policy follow the data into the operational catalog, the analytics layer and AI workflows without manual reconstruction. If they do not, users will build local definitions and governance will fragment.
Q: What is the difference between a data glossary and a semantic layer?
A: A data glossary defines terms in human language, while a semantic layer defines how those terms behave in queries, calculations and downstream analytics. The semantic layer is operational because it shapes what systems return, not just what people read. Organisations need both, but only the semantic layer affects runtime consistency.
Q: When does centralised governance fail in an open lakehouse?
A: It fails when policy is centralised but enforcement is not. In that case, the organisation can document controls in one system while access, definitions and transformations drift across clouds and tools. The warning sign is inconsistency between what the catalog says and what users actually see in production.
Q: How do security teams know whether governed semantics are actually working?
A: Look for consistent answers across reporting, analytics and AI outputs when the same business term is used. If finance, data science and platform teams resolve the same metric differently, the semantic layer is not operating as a control. The signal of success is not more documentation, but fewer interpretation disputes.
Technical breakdown
Bi-directional metadata flow in a governed lakehouse
A bi-directional integration means governance data moves in both directions between a system of record and the operational catalog. In this case, technical metadata discovered in Dataplex feeds back into Collibra, while business context and policy flow outward into the data fabric. That matters because catalog accuracy depends on continuous reconciliation between what is physically deployed and what the governance layer thinks exists. Without that loop, stewardship becomes stale and users drift back to local copies, spreadsheets or shadow data definitions.
Practical implication: treat metadata synchronisation as a control surface, not a convenience feature, and verify that inbound discovery and outbound policy flows both succeed.
Semantic layer vs business glossary: why the distinction matters
A business glossary names terms. A semantic layer defines how those terms behave in queries, calculations and downstream AI models. If finance defines net profit one way and the AI training pipeline resolves it another way, the model may still run but the output will not be operationally trustworthy. The technical risk is not just bad documentation. It is inconsistent logic across analytics, machine learning and reporting systems, which creates false confidence in AI-assisted decisions.
Practical implication: align glossary governance with semantic enforcement so the same definition is used in analytics, transformation and model inputs.
Governance in the open lakehouse depends on consistent policy inheritance
Open lakehouse architectures spread data across hybrid and multi-cloud environments, often with open table formats and shared catalog services. The technical challenge is policy inheritance across multiple physical locations, not merely cataloguing the assets. If governance is centralized but enforcement is fragmented, the organisation can describe data consistently while still failing to control it consistently. That is why the control problem in open architectures is less about visibility alone and more about maintaining authoritative semantics and policy at the point of use.
Practical implication: validate that policy definitions follow the data across environments rather than assuming catalog visibility equals effective governance.
NHI Mgmt Group analysis
Context is becoming the control plane for trust. When business meaning, ownership and policy sit outside the operational data environment, governance becomes advisory instead of enforceable. Collibra’s partnership update reflects a broader shift: organisations are trying to make meaning portable, not just metadata visible. For practitioners, that means the control question is no longer whether data is catalogued, but whether the same context survives where analytics and AI actually consume it.
Semantic inconsistency is an identity problem in disguise. Every governed data object has an effective identity made up of ownership, classification, policy and meaning. If any one of those attributes is detached from the runtime environment, users fall back to local assumptions and the enterprise loses a single source of truth. That is the same failure mode IAM teams see when entitlement records drift from actual access. Practitioners should treat semantics as governed identity for data assets.
Open lakehouse governance only works when the control plane and the execution plane stay aligned. Hybrid architectures reward central policy design but punish loose enforcement. The more the estate spans formats, clouds and analytical tools, the more dangerous it becomes to assume that a catalog entry is equivalent to operational control. The implication for security and governance teams is to re-evaluate where enforcement actually happens, not just where it is documented.
Unified context will increasingly shape AI reliability expectations. As AI moves from experimental use to operational decision support, the quality of the underlying semantic layer becomes a governance dependency rather than a data-management preference. Organisations that cannot explain field meaning, lineage and policy in the same place will struggle to defend model outputs to audit, risk and business stakeholders. Practitioners should treat semantic governance as part of AI trust architecture, not a post hoc documentation layer.
From our research:
- 70% of organisations grant AI systems more access than they would give a human employee performing the exact same job, according to the 2026 Infrastructure Identity Survey.
- Another finding from the same survey shows that only 44% of organisations have implemented any policies to manage their AI agents, despite 92% agreeing that governing AI agents is critical to enterprise security.
- For a broader governance lens, see Top 10 NHI Issues, which frames the identity controls that become harder to enforce when context and runtime access drift apart.
What this signals
Business context is moving from documentation to enforcement. For governance teams, that means the next maturity step is not richer cataloging alone. It is proving that the same ownership, policy and meaning survive inside the tools where analysts, engineers and AI actually work.
Semantic drift will become a control issue, not just a data quality issue. When the same field means one thing in the catalog and another in the model pipeline, the organisation loses auditability and decision confidence. Teams should expect more pressure to show that governed semantics are enforced in workflow, not only explained in metadata.
With 70% of organisations already granting AI systems more access than human employees, per the 2026 Infrastructure Identity Survey, context and access are converging into the same governance problem. That shift means data governance and identity governance can no longer be treated as separate programme tracks when AI participates in operational decision-making.
For practitioners
- Map authoritative data context to runtime locations Identify where business glossary, ownership and policy metadata are consumed in practice, then confirm those records are available inside the same environment where analysts and AI workloads operate.
- Test metadata sync in both directions Validate that discovery updates from the lakehouse reach the system of record and that governance policies published in the catalog appear correctly in the operational environment.
- Separate glossary management from semantic enforcement Review whether definitions are merely documented or actually used in transformations, feature engineering and reporting logic, then close gaps where the two diverge.
- Check policy inheritance across hybrid data estates Trace one governed dataset across clouds, formats and analytics tools to confirm that the same access and handling rules survive each hop without manual rework.
Key takeaways
- Fragmented data context weakens trust because users cannot reliably connect ownership, meaning and policy to the data they are using.
- Bi-directional integration matters because governance only works when technical discovery and business semantics stay aligned across the runtime environment.
- Practitioners should test whether semantic governance is enforced in analytics and AI workflows, not just defined in a catalogue.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
NIST CSF 2.0, NIST Zero Trust (SP 800-207) and NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | GV.RM-01 | Context governance affects how risk decisions are made across data and AI systems. |
| NIST Zero Trust (SP 800-207) | PR.AC-4 | Policy inheritance across environments mirrors zero trust control consistency. |
| NIST AI RMF | GOV-1 | Semantic reliability underpins trustworthy AI outputs. |
Define ownership and policy accountability for governed data assets before exposing them to analytics and AI.
Key terms
- Semantic layer: A semantic layer is the governed logic that gives data its operational meaning across queries, dashboards and AI systems. It defines how business terms map to calculations and relationships so the same concept is interpreted consistently in different tools and workflows.
- Business context: Business context is the ownership, definition, trust status and policy metadata attached to a data asset. It helps users understand what the data means, who is responsible for it and how it may be used, which reduces interpretation errors and governance drift.
- Open lakehouse: An open lakehouse is a data architecture that combines lake flexibility with warehouse-style management, often across hybrid and multi-cloud environments. Governance becomes harder in this model because the same data may be queried, transformed and governed in multiple locations.
- Metadata synchronisation: Metadata synchronisation is the process of keeping technical discovery data and governance records aligned as systems change. When it works, the catalogue reflects the runtime estate and policy remains current. When it fails, stewardship, access and lineage all become less reliable.
Deepen your knowledge
NHI Foundation Level course, the industry's only accredited NHI security programme, covers NHI governance, agentic AI identity, machine identity security, IAM, human identity, identity lifecycle, secrets management and workload identity. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.
This post draws on content published by Collibra: Google Cloud and Collibra deepen their partnership to bring business context and semantics directly to the Dataplex Universal Catalog. Read the original.
Published by the NHIMG editorial team on 2026-04-22.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org