Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

AI data leaks and shadow AI: what IAM teams need to know


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 2827
Topic starter  

TL;DR: AI data leaks now span prompts, coding assistants, training data, and autonomous agent workflows, and 20% of organizations with a breach said shadow AI was involved in 2025, according to WitnessAI. The real issue is not just leakage, but that existing DLP and CASB controls were never built for intent-driven AI interactions or machine-speed data movement.

NHIMG editorial — based on content published by WitnessAI: AI data leaks and how to prevent them

By the numbers:

Questions worth separating out

Q: What breaks when AI data loss controls rely only on DLP and CASB?

A: They miss the main AI leakage paths because prompts, responses, embeddings, and agent actions are not ordinary file transfers.

Q: Why do AI workflows complicate least-privilege access models?

A: AI workflows can move data across multiple systems in one session, especially when agents have API, database, and file-system access.

Q: How do security teams know if shadow AI governance is working?

A: Look for evidence that unsanctioned AI use is visible, attributable, and enforceable.

Practitioner guidance

  • Map AI usage by identity type Inventory which human users, service accounts, copilots, and autonomous agents can reach external AI services, internal models, and MCP-connected tools.
  • Replace label-only DLP with intent-aware controls Use controls that inspect prompts, responses, and workflow context instead of relying only on keywords, file labels, or network destinations.
  • Constrain agent tool access to least-privilege scopes Require task-scoped credentials, explicit tool allowlists, and policy checks before agents query databases, file systems, or external services.

What's in the full article

WitnessAI's full guide covers the operational detail this post intentionally leaves for the source:

  • A practical breakdown of which employee behaviours most often create AI leakage in real enterprises.
  • The platform's network-level discovery approach across browsers, coding assistants, desktop apps, and MCP-connected services.
  • Implementation detail for intent-based machine learning policies and bidirectional runtime protection.
  • Examples of how tokenization and response filtering are applied before and after model interaction.

👉 Read WitnessAI's full guide on AI data leaks and shadow AI governance →

AI data leaks and shadow AI: what IAM teams need to know?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 4 weeks ago
Posts: 1125
 

AI data leaks are now an identity governance problem, not just a content protection problem. The article shows that exposure can come from humans, service-style AI workflows, or autonomous agents, which means the control boundary is the actor and its access path, not only the document or field being protected. That shifts the programme question from "what data is sensitive?" to "which identities can move that data into AI systems?" Practitioners should govern AI usage as an access problem with data consequences.

A few things that frame the scale:

A question worth separating out:

Q: Who is accountable when an AI system leaks regulated data?

A: Accountability usually spans the business owner of the workflow, the security team responsible for policy, and the platform team that controls access and logging. When autonomous agents are involved, accountability must also cover the identity issuing the actions and the approvals that allowed tool access. That is why audit trails matter.

👉 Read our full editorial: AI data leaks expose a governance gap in enterprise security



   
ReplyQuote
Share: