TL;DR: Traditional DLP was built for static artifacts like emails, files, and endpoints, but GenAI systems now paraphrase, summarize, translate, and trigger tool calls that expose sensitive data in real time, according to Lakera. Legacy pattern matching cannot keep pace with language-native leakage or agent workflows, so runtime, context-aware controls are now mandatory.
NHIMG editorial — based on content published by Lakera: From Regex to Reasoning, why your data leakage prevention doesn’t speak the language of GenAI
Questions worth separating out
Q: How should security teams stop GenAI systems from leaking sensitive data?
A: Security teams should combine runtime policy enforcement, semantic detection, and identity-aware access checks.
Q: Why do traditional DLP tools struggle with GenAI and agents?
A: Traditional DLP tools depend on static patterns and predictable content, while GenAI rewrites information in real time.
Q: What breaks when DLP only inspects prompts and outputs?
A: What breaks is the assumption that the risky event happens at the edge of the conversation.
Practitioner guidance
- Map data flows across the full GenAI stack Track prompts, embeddings, context windows, retrieval sources, memory, outputs, and downstream tool calls so you know where sensitive data can appear and reappear.
- Enforce access at the moment of generation Allow models and agents to reach only the data the requesting identity is authorised to see at that exact interaction, not the broader repository by default.
- Inspect agent reasoning and tool use Monitor memory access, function calls, and message passing, because many leaks happen in intermediate steps that never show up in a simple prompt or response log.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- How the vendor frames language-native DLP detectors and what that means for implementation choices.
- Examples of GenAI leakage patterns involving summarisation, translation, and tool-driven workflows.
- Practical checklist items for teams building guardrails into LLM applications and agent pipelines.
- The vendor's discussion of red-team exercises and where runtime monitoring fits into a broader GenAI security programme.
👉 Read Lakera's analysis of why traditional DLP fails for GenAI →
GenAI data leakage: why legacy DLP controls are falling short?
Explore further
Regex-era DLP is an access-control model for a transformation problem. Traditional controls were designed to spot known strings in known places, but GenAI changes the unit of risk from file or field to meaning. Once a model can paraphrase, translate, and synthesise information, the control boundary is no longer the literal token. Practitioners should treat this as a governance mismatch, not a tuning problem.
A few things that frame the scale:
- 88% of security professionals are concerned about secrets sprawl, with 49% of those in larger organisations described as "very concerned", according to The 2024 State of Secrets Management Survey.
- 54% of organisations are dissatisfied with their current secrets management solution because not all secrets are secured, and 43% cite lack of central management.
A question worth separating out:
Q: How do organisations govern sensitive data in AI agents and LLM workflows?
A: Organisations should treat sensitive data governance as a runtime identity and context problem. That means authorising access based on who is asking, what the model can infer, and how the output will be used. The strongest controls sit inside the workflow, not around it.
👉 Read our full editorial: GenAI data leakage exposes the limits of regex-based DLP