TL;DR: AI data leaks now span prompts, coding assistants, training data, and autonomous agent workflows, and 20% of organizations with a breach said shadow AI was involved in 2025, according to WitnessAI. The real issue is not just leakage, but that existing DLP and CASB controls were never built for intent-driven AI interactions or machine-speed data movement.
NHIMG editorial — based on content published by WitnessAI: AI data leaks and how to prevent them
By the numbers:
- In 2025, 20% of organizations that suffered a data breach said the security incidents involved shadow AI.
- The global average breach cost was $4.44 million in 2025.
- The EU AI Act can penalize prohibited AI practices up to €35 million or 7% of global revenue.
Questions worth separating out
Q: What breaks when AI data loss controls rely only on DLP and CASB?
A: They miss the main AI leakage paths because prompts, responses, embeddings, and agent actions are not ordinary file transfers.
Q: Why do AI workflows complicate least-privilege access models?
A: AI workflows can move data across multiple systems in one session, especially when agents have API, database, and file-system access.
Q: How do security teams know if shadow AI governance is working?
A: Look for evidence that unsanctioned AI use is visible, attributable, and enforceable.
Practitioner guidance
- Map AI usage by identity type Inventory which human users, service accounts, copilots, and autonomous agents can reach external AI services, internal models, and MCP-connected tools.
- Replace label-only DLP with intent-aware controls Use controls that inspect prompts, responses, and workflow context instead of relying only on keywords, file labels, or network destinations.
- Constrain agent tool access to least-privilege scopes Require task-scoped credentials, explicit tool allowlists, and policy checks before agents query databases, file systems, or external services.
What's in the full article
WitnessAI's full guide covers the operational detail this post intentionally leaves for the source:
- A practical breakdown of which employee behaviours most often create AI leakage in real enterprises.
- The platform's network-level discovery approach across browsers, coding assistants, desktop apps, and MCP-connected services.
- Implementation detail for intent-based machine learning policies and bidirectional runtime protection.
- Examples of how tokenization and response filtering are applied before and after model interaction.
👉 Read WitnessAI's full guide on AI data leaks and shadow AI governance →
AI data leaks and shadow AI: what IAM teams need to know?
Explore further
AI data leaks are now an identity governance problem, not just a content protection problem. The article shows that exposure can come from humans, service-style AI workflows, or autonomous agents, which means the control boundary is the actor and its access path, not only the document or field being protected. That shifts the programme question from "what data is sensitive?" to "which identities can move that data into AI systems?" Practitioners should govern AI usage as an access problem with data consequences.
A few things that frame the scale:
- The average organisation believes more than 1 in 5 of their non-human identities are insufficiently secured, according to The 2024 ESG Report: Managing Non-Human Identities.
- Enterprises that have experienced a compromised NHI averaged 2.7 separate incidents in the past 12 months, according to The 2024 ESG Report: Managing Non-Human Identities.
A question worth separating out:
Q: Who is accountable when an AI system leaks regulated data?
A: Accountability usually spans the business owner of the workflow, the security team responsible for policy, and the platform team that controls access and logging. When autonomous agents are involved, accountability must also cover the identity issuing the actions and the approvals that allowed tool access. That is why audit trails matter.
👉 Read our full editorial: AI data leaks expose a governance gap in enterprise security