AI data leakage exposes the limits of legacy DLP and CASB

By NHI Mgmt Group Editorial TeamPublished 2026-06-28Domain: Best PracticesSource: WitnessAI

TL;DR: AI prompts can now pull sensitive data from payroll files, connected systems, and shadow accounts faster than traditional DLP and CASB were built to inspect, according to WitnessAI. The real failure is not visibility alone but governance that assumes risky data moves only through legacy channels, not copilots, agents, and conversational interfaces.

At a glance

What this is: This analysis shows how AI data leakage occurs through prompts, copilots, shadow accounts, and agent workflows that bypass legacy data controls.

Why it matters: It matters because IAM, NHI, and human identity teams now have to govern who can expose data to AI, not just who can access the underlying systems.

By the numbers:

Nearly 10% of employee prompts to popular large language models include sensitive information.

👉 Read WitnessAI's analysis of AI data leakage and legacy control gaps

Context

AI data leakage is the exposure of sensitive information through chat prompts, copilots, agent workflows, and other AI interfaces that sit outside traditional data governance perimeters. For IAM teams, the problem is no longer only access to a file or database, but whether an identity can reveal governed data through an AI interaction.

Legacy DLP and CASB controls were built for email, file transfer, and structured workflows. They struggle when the risk lives in conversational meaning, model output, hidden agent behaviour, and shadow accounts that bypass single sign-on and central logging.

The result is a governance gap across both the human workforce and the digital workforce. As AI adoption accelerates, programmes need controls that connect identity, intent, and runtime enforcement rather than relying on perimeter-era assumptions.

Key questions

Q: How should security teams prevent sensitive data from leaking through AI prompts and copilots?

A: Security teams should combine discovery, intent-aware classification, entitlement review, and runtime enforcement. The critical move is to govern the AI interaction itself, not just the file or system behind it. Managed identities, policy routing, and inspection of prompts and responses give teams better control than legacy DLP alone.

Q: Why do AI tools create more data leakage risk than traditional SaaS applications?

A: AI tools create more risk because the sensitive event often happens in natural language, not in a file transfer or database query. A user can reveal information through a prompt, or an agent can surface over-permissioned data in a response. That makes context, intent, and runtime behaviour part of the access decision.

Q: What do security teams get wrong about shadow AI?

A: They treat shadow AI as only an application discovery problem. In practice, it is also an identity and governance problem because unmanaged accounts remove logging, policy enforcement, and retention controls. If the account path is not governed, the organisation cannot prove who used the AI tool or what data it exposed.

Q: Should organisations block all AI use to reduce leakage risk?

A: No. Blanket blocking often pushes users toward unmanaged tools, which increases shadow AI. A better approach is to allow approved use with policy-based routing, tokenization, and account-level governance. That keeps productivity available while reducing the chance that sensitive data leaves governed boundaries.

Technical breakdown

Why conversational AI breaks legacy data loss controls

Traditional DLP and CASB inspect files, messages, and known data patterns, but AI leakage often emerges in natural language. A prompt can contain sensitive context without matching a fingerprint, and a response can reveal material through paraphrase rather than direct copy. That means the control must understand the interaction, not just the payload. This is why keyword-based inspection misses both accidental disclosure and intentional exfiltration in multi-turn sessions. The issue is architectural: the sensitive event occurs inside the conversation, where legacy tools have limited semantic visibility.

Practical implication: add AI-aware inspection and policy enforcement before relying on legacy DLP as the primary safeguard.

Shadow AI, personal accounts, and the identity gap

Shadow AI appears when employees use unmanaged accounts, browser extensions, or consumer tools outside enterprise identity controls. In those cases, policy, retention, audit trails, and account-level governance become inconsistent or absent. This is an identity problem as much as a data problem, because the same user can move between governed and ungoverned AI contexts with very different risk postures. When the account is unmanaged, the organisation loses the evidence chain that ties activity back to a responsible identity and policy boundary.

Practical implication: discover unmanaged AI accounts and route them into governed identity and logging paths before expanding AI use.

How over-permissioned copilots and agents leak data at runtime

Copilots and agents inherit the permissions of the systems they can query, which means overly broad access becomes visible through AI output. If an agent can retrieve live records, it can surface data to a user who should not have seen it, or follow injected instructions that turn retrieval into exfiltration. The technical risk is not just model hallucination. It is runtime access combined with excessive entitlement and instruction ambiguity. Once an agent can act across connected systems, the defence must sit between intent and execution.

Practical implication: review connected-system entitlements for AI tools and add runtime checkpoints before agent tool calls.

NHI Mgmt Group analysis

AI data leakage is now an identity governance problem, not just a content filtering problem. The article shows that sensitive data is escaping through prompts, copilots, and agent workflows that sit outside the perimeter logic of older controls. That changes the control objective from detecting bad content to governing who can expose which data through which AI interface. For IAM and NHI teams, the practical conclusion is that identity context must move into the data protection layer.

Shadow AI creates an accountability failure before it creates a technical leak. When employees use unmanaged personal accounts or embedded AI features outside SSO, the organisation loses logging, retention, and policy enforcement at the account level. This is a governance gap because the access path no longer sits inside the programme’s normal review and certification processes. The implication is that AI usage cannot be treated as an application-class issue alone; it is a lifecycle and identity boundary issue.

Intent-aware inspection is the named concept this topic now demands. Pattern matching is no longer enough when the same text can represent legitimate analysis, accidental disclosure, or deliberate exfiltration. The article makes clear that the distinguishing signal is behavioural intent across the session, not static keywords in a single prompt. Practitioner teams should recognise that conversational meaning is now part of the access decision surface.

Runtime governance must extend to digital workforce activity as well as human prompts. The vendor’s framing around copilots and autonomous agents is directionally right, but the broader lesson is that AI-generated data exposure will increasingly come from delegated execution rather than direct human typing. That means the security model has to account for identity, action, and downstream retrieval in one chain. The conclusion for practitioners is to govern the actor, not only the text.

Legacy data protection tools are being asked to solve an AI problem they were not designed for. DLP and CASB can still help in adjacent channels, but the article shows they lack native understanding of prompts, responses, and agent behaviour. That mismatch is why enterprises can report clean control coverage while leakage continues through AI interfaces. The field should treat AI governance as a separate control plane, not a feature request for the old one.

From our research:
The average time to mitigate a leaked secret is 36 hours, highlighting the operational burden of manual remediation processes, according to The 2024 State of Secrets Management Survey.
54% of organisations are dissatisfied with their current secrets management solution because not all secrets are secured, and 43% cite lack of central management.
The same governance gap now shows up in AI workflows, so teams should pair discovery with Guide to the Secret Sprawl Challenge to reduce exposure at source.

What this signals

Secret leakage into AI workflows is a lifecycle problem as much as a detection problem. Once sensitive data moves into prompts or agentic retrieval paths, manual remediation timelines become too slow to support rapid AI adoption. The governance question is whether identity, secrets, and data controls can converge before users normalize unsafe AI behaviour.

Enterprises should expect AI governance to shift from tool approval to interaction control. That means policy engines need to understand managed identities, shadow accounts, and agent actions as part of the same risk chain, especially where copilots can reach into business systems.

The next control boundary will be the point where intent meets execution. Organisations that can inspect, classify, and route AI interactions in real time will have a better path to governed adoption than those trying to retrofit file-centric controls onto conversational systems.

For practitioners

Map AI interactions to identity sources Correlate prompts, responses, and agent actions back to managed identities, including personal accounts and embedded AI features that bypass SSO. This is the only way to preserve accountability across human and digital workforce activity.
Classify prompt intent before enforcing policy Use intent-aware controls to distinguish legitimate analysis from disclosure risk, especially in multi-turn conversations where sensitive information appears gradually. Static keyword rules are not sufficient for conversational AI.
Review entitlements behind copilots and agents Audit the systems connected to AI tools and reduce over-broad retrieval permissions so agents cannot surface data beyond the user’s role. Treat every connected data source as part of the AI attack surface.
Add runtime checkpoints for agent tool calls Require pre-execution inspection for actions that retrieve records, call APIs, or trigger downstream workflows. This limits indirect prompt injection and keeps agent behaviour inside governed boundaries.
Tokenize sensitive data before model exposure Replace credentials, personal data, and other high-risk values with secure tokens before they leave the enterprise boundary, then rehydrate them after processing. This preserves workflow usability while reducing leakage exposure.

Key takeaways

AI data leakage now travels through prompts, copilots, and agent actions, which makes identity context part of data protection.
Legacy DLP and CASB can miss conversational meaning, shadow accounts, and runtime retrieval, so clean reports do not equal real control.
Practitioners need discovery, intent-aware policy, tokenization, and runtime checkpoints to keep AI adoption governed at scale.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	PR.AC-4	AI prompts and agent actions need identity-based access control.
OWASP Non-Human Identity Top 10	NHI-01	Shadow AI and unmanaged accounts are non-human identity exposure paths.
NIST Zero Trust (SP 800-207)	AC-6	Runtime AI routing and least-privilege access align with zero trust principles.

Map AI interaction paths to PR.AC-4 and reduce standing access to governed data sources.

Key terms

Shadow AI: Shadow AI is the use of AI tools, accounts, or embedded features that operate outside enterprise visibility and governance. In practice, it removes identity, logging, and policy controls from the interaction path, which makes sensitive-data exposure harder to detect and harder to prove after the fact.
Intent-aware classification: Intent-aware classification is the process of judging an AI interaction by what the user or agent is trying to do, not only by keywords or file patterns. It is useful because the same text can support legitimate work or data exfiltration, depending on session context and behaviour.
Runtime governance: Runtime governance is control applied while an AI system is processing a prompt, response, or tool call. It matters because many AI risks emerge after approval, when the model or agent retrieves data, transforms it, or triggers an action that would not be visible in static review.
Tokenization: Tokenization replaces sensitive values with protected substitutes before data leaves the enterprise boundary. The original value can be restored later for business use, which lets teams preserve workflow utility while reducing the chance that models or third parties see the raw data.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by WitnessAI: AI data leakage prevention and why legacy controls miss it. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-28.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org