AI agent data governance exposes gaps in enterprise access control

By NHI Mgmt Group Editorial TeamPublished 2025-11-10Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: AI agents can now reach sensitive enterprise data through governed data platforms, but policy-based controls alone do not solve agent identity, authorization, or audit gaps, according to WorkOS. The real issue is that access governance built for data platforms does not fully cover autonomous or semi-autonomous access paths.

At a glance

What this is: This analysis examines how data governance platforms are being positioned for AI agent security and finds that policy enforcement at the data layer does not replace agent identity and authorization controls.

Why it matters: It matters because IAM, NHI, and autonomous-system teams need to decide where data governance ends and identity governance begins when AI agents touch regulated information.

By the numbers:

The company has raised over $127 million in funding and serves major enterprise customers including JB Hunt, Swedbank, Thomson Reuters, Booking.com, GM, and Roche.

👉 Read WorkOS's analysis of Immuta and AI agent data governance

Context

AI agent data governance is the problem of controlling which information an agent can reach, retrieve, and surface without assuming that platform-level policy alone is enough. In this article, the key gap is not whether sensitive data can be classified, but whether the identity, authorization, and audit model around the agent is strong enough to govern access end to end.

That distinction matters for IAM programmes because data controls and identity controls solve different parts of the same problem. A policy engine can constrain queries inside a warehouse, but it does not by itself establish agent identity, lifecycle governance, or delegated authority across the wider application stack.

Key questions

Q: How should security teams govern AI agents that access sensitive data?

A: Security teams should split the problem into three layers: agent identity, authorization, and data access. Data governance can restrict what content is returned, but it does not prove who the agent is or what else it can do. The safest model ties authenticated agent identity to explicit entitlements, then records every retrieval and downstream action.

Q: What breaks when data governance is used as a substitute for AI agent identity controls?

A: What breaks is accountability. A data policy may block some sensitive records, but it cannot on its own answer who initiated the request, whether the agent was properly authenticated, or whether the same identity can act elsewhere in the stack. That leaves security teams with partial evidence and incomplete control.

Q: How do security teams know if AI agent access controls are actually working?

A: Look for evidence across three signals: denied retrievals for restricted fragments, complete logs that link agent identity to each request, and no unexplained access outside intended data domains. If the audit trail stops at the warehouse, the control is only partially working.

Q: Who is accountable when an AI agent surfaces restricted data?

A: Accountability should sit with the team that owns the agent identity and its delegated permissions, not only with the data platform team. If the agent can retrieve regulated content through a valid identity path, the failure spans IAM, data governance, and application ownership together.

Technical breakdown

Policy-based data access control for AI agent retrieval

Policy-based access control centralizes rules so that one policy definition can be enforced across multiple data platforms. In RAG workflows, that usually means the control acts at the warehouse, vector store, or chunk-retrieval layer, deciding what content can be returned to a requestor. The security value is real, but it is still a data-layer control. It does not authenticate the agent, define its runtime authority, or manage how the agent behaves once it has a valid request path.

Practical implication: treat data policies as one enforcement layer and verify that agent identity and authorization are covered separately.

RAG chunk-level classification and retrieval boundaries

Chunk-level classification breaks source content into smaller governed units so that a retrieval system can deny access to sensitive fragments even when the parent dataset is broadly accessible. This matters because large language models do not need full-table access to leak data; a single exposed chunk can be enough to surface regulated content. The architectural challenge is consistency. Classification quality, entitlement mapping, and retrieval enforcement all have to line up or the model will inherit the weakest boundary.

Practical implication: test retrieval boundaries against sensitive fragments, not just whole datasets.

Why agent identity and authorization remain separate controls

Agent identity answers who or what is making the request. Authorization answers what that identity may do once it is recognized. In AI agent systems, those are not the same as warehouse permissions, and they are not solved by data governance alone. If the agent can initiate requests, chain tools, or act across systems, the access model needs to cover the full execution path, not only the data source that happens to satisfy the request.

Practical implication: map agent authentication, authorization, and audit trails across the full application path, not only inside the data platform.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI agent data governance is not the same thing as agent identity governance. The article shows a common market mistake: assuming that policy enforcement inside a data platform can stand in for the full access model around an AI agent. That works for query boundary control, but not for proving who the agent is, what it can initiate, or how its privileges change over time. Practitioners should treat the data layer as necessary but incomplete.

Runtime authorization for AI agents is a separate governance plane. The platform described here can constrain retrieval, but it does not provide the agent authentication and delegated authorization machinery that determines whether an agent should be trusted in the first place. That is the gap many programmes miss when they collapse all security controls into the storage layer. Practitioners should separate data access policy from identity authority.

Chunk-level governance creates a useful named control boundary, but only for content exposure. Fine-grained retrieval controls reduce the chance that an agent surfaces restricted information, yet they do not solve lifecycle offboarding, credential ownership, or application-wide auditability. That means the real programme question is where data governance stops and identity governance takes over. Practitioners should design that boundary explicitly.

NHI governance for AI applications now spans data, identity, and lifecycle in one chain. The article reinforces that AI agent security cannot be solved with warehouse policy alone, because enterprises need SSO, provisioning, and access review across the systems that let the agent act. That makes the control problem cross-domain rather than tool-specific. Practitioners should govern the whole delegation path, not only the repository of data.

Policy engines will not resolve the structural mismatch between static entitlements and dynamic agent behaviour. Data policy is usually evaluated at request time, but agent behaviour unfolds across many requests, tools, and contexts. That creates a governance mismatch for IAM teams that still think in single-system permissions. Practitioners should expect more demand for continuous identity governance around AI workloads.

From our research:
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation, according to AI Agents: The New Attack Surface report.
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, according to the same AI Agents: The New Attack Surface report.
For a broader control model, see OWASP Agentic AI Top 10 for the identity and privilege abuse patterns that retrieval controls alone do not cover.

What this signals

AI agent governance is moving from a data-only concern to an identity operating-model issue. Teams that rely on warehouse policy will quickly find that retrieval control does not answer lifecycle, delegation, or accountability questions. As agent use expands, the programme boundary needs to include identity proofing, entitlement design, and audit correlation across systems.

Retrieval boundaries will matter more as regulated data moves into agent workflows. The practical risk is not just disclosure, but governance drift when different teams own the warehouse, the application, and the agent itself. This is where the identity blast radius becomes a useful concept: once one agent identity is over-permissioned, the exposure can move across multiple data domains before anyone notices.

With 52% of companies able to track and audit what their AI agents access, the remaining gap is too large to treat as an edge case. The signal for practitioners is clear: if you cannot correlate agent identity, retrieval events, and downstream use, you do not yet have governed AI access.

For practitioners

Separate agent identity from data access policy Define which controls prove the agent is authenticated, which controls constrain retrieval, and which controls audit behaviour across the application stack. Do not let warehouse policy become the default answer for agent security.
Map retrieval boundaries at chunk level Test whether sensitive fragments remain protected when they are retrieved through RAG workflows, not just when full datasets are queried. Validate chunk-level enforcement against the specific data classes your agents can reach.
Extend audit coverage beyond the data platform Capture agent requests, upstream identity, downstream tool use, and final data exposure in one audit trail. A data-platform log alone will not explain how the agent reached the content or what it did next.
Review lifecycle ownership for agent credentials Assign clear ownership for provisioning, rotation, and offboarding of any account or token an AI agent uses. If the agent depends on service credentials, lifecycle gaps become identity gaps.

Key takeaways

AI agent data governance helps at the retrieval layer, but it does not replace identity governance for the agent itself.
Fine-grained chunk controls reduce exposure, yet they still leave lifecycle, authentication, and audit ownership unresolved.
Practitioners should govern the full delegation path, because access policy alone cannot explain or contain agent behaviour.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Agent retrieval and tool-use boundaries map to agent identity and privilege abuse.
OWASP Non-Human Identity Top 10	NHI-03	Credential lifecycle and access scope matter when agents rely on service identities.
NIST Zero Trust (SP 800-207)	PR.AC-4	Continuous authorization is needed when access spans data, identity, and application layers.

Map AI agent retrieval paths to agentic controls and verify the agent cannot exceed intended scope.

Key terms

Rag Security: RAG security is the practice of controlling what retrieval-augmented generation systems can see, fetch, and surface. It combines content classification, entitlement checks, and retrieval enforcement so that the model only answers from data the requesting identity is allowed to access.
Chunk-level Classification: Chunk-level classification breaks documents or datasets into smaller governed units for access control. In AI workflows, this lets teams restrict sensitive fragments even when the broader source is partially shared, which reduces accidental exposure through retrieval and summarisation.
Agent Identity: Agent identity is the set of credentials, trust signals, and entitlements that prove an AI agent is authorised to act. It is distinct from the data it accesses, and in production systems it must be governed separately from retrieval policy and storage permissions.
Delegated Authority: Delegated authority is access granted to a non-human actor to act on behalf of a user, service, or system. It becomes risky when the delegation path is unclear, because teams may not know which identity owns the action or where the responsibility ends.

Deepen your knowledge

AI agent data governance and delegated access control are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for agents that touch regulated data, it is a useful place to start.

This post draws on content published by WorkOS: Immuta for AI Agent Security, features, pricing, and alternatives. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-11-10.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org