Collibra’s governance thesis: AI agents are driving consolidation

By NHI Mgmt Group Editorial TeamPublished 2025-06-13Domain: Governance & RiskSource: Collibra

TL;DR: As AI-driven demand collapses the old modern data stack, governance, semantics, and open standards are moving to the center as storage and compute decouple across platforms, according to Collibra. The governance lesson is broader than data tooling: as AI agents consume more context, fragmented control planes become operationally fragile.

At a glance

What this is: Collibra frames AI agents, data-platform consolidation, and governance as a single market shift toward unified control layers.

Why it matters: It matters because identity, access, and governance teams will be asked to manage more autonomous workloads across fewer platforms, with stronger expectations for unified policy and auditability.

👉 Read Collibra's analysis of AI-driven governance consolidation and data platforms

Context

The core problem is governance fragmentation. As data platforms decouple storage from compute and AI agents consume structured and unstructured data together, the old pattern of separate catalogs, lineage tools, access controls, and observability layers becomes harder to operate consistently.

For IAM, NHI, and broader identity programmes, the implication is straightforward: governance has to follow the workload, the context, and the access path across platforms. When the control plane is fragmented, entitlement review, policy enforcement, and audit evidence become fragmented too.

Key questions

Q: How should teams govern AI agents that consume both structured and unstructured data?

A: Teams should govern the identities that consume the data, not just the repositories that hold it. That means binding access policy, lineage, monitoring, and audit evidence to each workload or service account, including AI agents that can combine multiple sources. The practical test is whether you can trace who or what used the data, under which privileges, and with what downstream effect.

Q: Why does governance fragmentation become a security problem in AI data platforms?

A: Fragmentation creates multiple versions of the truth for access, lineage, and policy enforcement, which weakens auditability and increases the chance of inconsistent decisions. In AI environments, that is not just inefficient, because autonomous or semi-autonomous consumers can exploit those gaps faster than humans can reconcile them. The risk is operational inconsistency becoming a control failure.

Q: What do identity teams get wrong about data governance in AI platforms?

A: They often treat governance as a data-management layer rather than an identity and access layer. Once AI systems and workloads start consuming data across platforms, the important question becomes which identities can act on which context, and whether those decisions remain consistent. If that is unclear, governance is only documentation, not control.

Q: Which frameworks are most relevant when governance spans AI workloads and data platforms?

A: NIST Cybersecurity Framework 2.0 is useful for structuring governance, protection, detection, and response across the estate. For organisations using workload identity patterns, access traceability and policy enforcement should also map to zero trust principles so that identity, not location, becomes the control anchor.

Technical breakdown

Why decoupled storage and compute change governance scope

Open table formats and zero-copy data movement reduce the need to shuttle data between platforms, which shifts the operational burden from transport to policy enforcement. In that model, governance cannot depend on where data is stored, because the same dataset may be queried, transformed, and consumed across multiple engines. The result is a broader, more distributed trust boundary. For identity teams, the challenge is no longer just who can reach a system, but which identities, workloads, and agents can act on governed data across environments without losing traceability.

Practical implication: map policy enforcement to the data access path, not to a single platform boundary.

Why unstructured data governance now becomes an identity problem

Unstructured data carries business meaning, sensitive content, and context that AI systems can ingest at scale, but it is far harder to classify and control than traditional structured records. That turns discovery, curation, monitoring, and access governance into an identity problem as much as a data problem, because every consumer of that content needs a reliable trust posture. If the consumer is an AI agent, the governance requirement expands further: access must be controllable, attributable, and reviewable across dynamic usage patterns rather than static user roles.

Practical implication: extend governance controls to the identities that consume unstructured data, including service accounts and AI workloads.

What consolidation means for the modern data stack

The article describes a market moving away from many small point products toward fewer broader platforms. That usually happens when overlapping features fail to create enough operational value to justify the management overhead. Governance is the category most likely to absorb that change, because it is where visibility, access, lineage, quality, and trust have to meet. As AI usage grows, the winning model is less about owning every point capability and more about creating a unified governance layer that can operate across diverse systems and identities.

Practical implication: re-evaluate whether your current governance stack can actually produce a single control narrative across platforms.

NHI Mgmt Group analysis

Governance fragmentation, not just data fragmentation, is the real scaling failure. The article is right to treat consolidation as a governance story, because the operational pain now shifts from moving data to governing every place it is consumed. When AI agents and multi-structured data span multiple platforms, separate control planes create inconsistent access, weak evidence chains, and duplicated policy logic. The practitioner conclusion is that governance architecture is becoming a core part of identity architecture.

Unstructured data has become an identity-governed asset. The article correctly identifies unstructured content as the next unsolved governance surface, and that matters because the consumer of that content may be a workload, a service account, or an AI agent rather than a human user. Discovery, classification, and monitoring all depend on knowing which identity is acting on the content and with what privilege. The practitioner conclusion is that access governance now has to extend to non-traditional consumers, not just people.

Unified governance is becoming a market requirement, not a luxury feature. The vendor's description of a fragmented, expensive stack reflects what many programmes already see: isolated tools do not create a coherent control model. Open standards such as OpenLineage and Open Data Contracts matter because they help preserve portability and evidence across systems. The practitioner conclusion is to prioritise architectures that make governance durable across platforms, not only within them.

AI increases the value of the semantic layer because context is now executable. The article's point about semantic governance is important: AI systems do not just need data, they need trustworthy context to act on that data safely. That changes governance from passive documentation into an executable decision layer that influences downstream use. The practitioner conclusion is to treat semantics, lineage, and policy as operational controls, not metadata decoration.

Control sprawl will follow platform sprawl unless governance is decoupled from compute. The article's strongest structural insight is that data and AI workloads will keep moving, while governance needs to remain stable. That is the same problem identity teams face when entitlements, workload identities, and access evidence get trapped inside individual tools. The practitioner conclusion is to build governance once and reuse it across platforms instead of reimplementing it in each stack.

From our research:
72% of organisations have experienced or suspect they have experienced a breach of non-human identities, according to The 2024 ESG Report: Managing Non-Human Identities.
Our research also found that enterprises that have experienced a compromised NHI averaged 2.7 separate incidents in the past 12 months, which shows how quickly exposure compounds once machine access is in play.
That is why readers should also review the Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs for the lifecycle controls that need to stay consistent across data and AI programmes.

What this signals

Governance programmes will be judged on whether they can preserve a single control story across data platforms, AI systems, and workload identities. If access policy, lineage, and audit evidence are split across tools, the programme will look simpler than it really is and fail when asked to prove accountability. The practical shift is toward unified governance that can follow the identity, not the storage layer.

With 72% of organisations already experiencing or suspecting an NHI breach, the assumption that machine access is a secondary concern no longer holds. That figure from The 2024 ESG Report: Managing Non-Human Identities is a warning that workload identity governance must be treated as an active control domain, not a back-office hygiene task. Teams that wait for a clean platform rationalisation before fixing identity sprawl will be late.

Unified governance will increasingly become the differentiator for AI-ready data estates. The programmes that can tie semantics, access, and lineage to real identities will scale more cleanly than those relying on tool-by-tool policy overlays. For practitioners, that means architecture decisions now have long-tail identity consequences.

For practitioners

Inventory governance control planes across platforms Document where access policy, lineage, catalog, quality, and monitoring are enforced today, then identify where the same decision is being duplicated in multiple tools. The goal is to find fragmented control logic before AI workloads make it harder to reconcile.
Treat AI workload access as a first-class identity problem Review which non-human identities, service accounts, and automated consumers can reach structured and unstructured data, then require traceability for each consumption path. This is especially important where AI agents combine data across systems.
Prioritise open standards in governance design Favour architecture patterns that preserve portability of lineage and policy across systems, including standards such as OpenLineage and Open Data Contracts. That reduces lock-in and makes audit evidence easier to retain as platforms consolidate.
Reassess the modern data stack for control duplication Look for overlapping tools that produce partial governance outcomes but no single authoritative view of access, trust, or accountability. Consolidation is easier to manage when the programme is already simplifying control boundaries.

Key takeaways

Collibra's argument is fundamentally about governance architecture, not just data-platform consolidation.
As AI agents consume more structured and unstructured data, fragmented control planes become harder to defend and audit.
Practitioners should simplify governance layers, strengthen identity traceability, and prefer portable standards over tool-specific policy islands.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	GV.OV-01	Governance oversight is central to fragmented data and AI control planes.
NIST Zero Trust (SP 800-207)	PR.AC-4	Identity-based access control is needed when workloads move across platforms.
OWASP Non-Human Identity Top 10	NHI-03	Workload and service identities are part of the governance surface here.

Establish one governance model for access, lineage, and accountability across all data platforms.

Key terms

Unified Governance: A single governance model that keeps policy, lineage, access, and accountability consistent across multiple systems. In practice, it prevents each platform from making its own version of the same control decision, which is essential when data, workloads, and AI systems operate across boundaries.
Semantic Layer: A business-context layer that turns raw data into terms, relationships, and rules that applications can use reliably. It matters because AI systems need more than storage and compute. They need consistent meaning, trustworthy context, and enforceable governance if they are going to act on the data.
Zero-Copy Data Access: An architecture pattern that allows data to be queried or consumed without moving it between systems. It reduces duplication and operational overhead, but it also means governance must travel with the data and the identities that access it, because the storage boundary is no longer the control boundary.
Non-Human Identity: A machine or workload identity used by software rather than a person, including service accounts, tokens, API keys, certificates, bots, and AI agents. These identities can consume, transform, or expose data at scale, so their permissions, lifecycle, and traceability must be governed explicitly.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Collibra: AI agents and governance are reshaping the data stack. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-06-13.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org