How should security teams stop GenAI systems from leaking sensitive data?

Security teams should combine runtime policy enforcement, semantic detection, and identity-aware access checks. The goal is not to block every model response, but to prevent the model from seeing or transforming data the requester is not authorised to use. That means guarding prompts, retrieval, memory, outputs, and tool calls together.

Why This Matters for Security Teams

GenAI data leakage is usually not a single failure. It is the result of prompts, retrieval, memory, tool calls, and outputs all handling sensitive data without the same identity-aware controls applied to human users. Current guidance suggests the bigger risk is not model hallucination alone, but over-broad access and weak runtime enforcement. NIST’s NIST AI 600-1 GenAI Profile and NHIMG research on Guide to the Secret Sprawl Challenge both point to the same pattern: sensitive data tends to move through systems that were not designed to distinguish authorised use from merely possible use.

The practical mistake is treating GenAI as a content moderation problem instead of a control-plane problem. If the model can retrieve records, call tools, or retain memory beyond the task scope, it can expose data even when the user never had direct access to the source system. That is why mature programs now pair semantic controls with workload identity, short-lived access, and policy evaluation at request time. In practice, many security teams encounter leakage only after a prompt, connector, or agent workflow has already exposed data to a model that should never have seen it.

How It Works in Practice

Effective containment starts by reducing what the model can reach. The model should not inherit broad application credentials, and it should not query entire data sets just because a user typed a question. Instead, security teams should gate access with intent-aware policy checks that evaluate the task, the requester, the data classification, and the tool being invoked. This is where runtime authorization matters more than static RBAC: the same user may be allowed to ask for one summary and denied access to the underlying source records.

At implementation level, the control stack usually includes four layers:

Request filtering before prompts are sent to the model, including sensitive data detection and redaction.
Retrieval filtering so RAG systems only return documents the requester can already access.
Tool and connector authorization so the model cannot exfiltrate through email, ticketing, file systems, or APIs.
Output scanning to catch accidental disclosure, regulated data, or policy violations before the response is delivered.

Identity also matters. For autonomous or semi-autonomous systems, workload identity is the better primitive than shared API keys because it ties each action to a cryptographic identity and a short-lived token. That approach aligns with broader NHI guidance from Ultimate Guide to NHIs — Key Research and Survey Results and the breach patterns documented in The 52 NHI breaches Report. The operational goal is to keep access ephemeral, context-bound, and revocable when the task ends. These controls tend to break down in legacy environments where shared service accounts, flat data stores, and unmanaged connectors make per-request authorization too expensive to enforce.

Common Variations and Edge Cases

Tighter data controls often increase latency, complexity, and false positives, so organisations have to balance disclosure risk against workflow friction. There is no universal standard for this yet, especially for long-context models, agentic tool use, and cross-domain retrieval pipelines. Best practice is evolving toward policy-as-code, but the exact placement of DLP, prompt filtering, and authorization checks still depends on the architecture.

One common edge case is internal assistant deployments that appear low risk because they sit behind a corporate login. That assumption fails when the assistant can reach email, files, ticketing systems, or OAuth-connected SaaS apps with broader permissions than the human requester. Another edge case is memory: persistent conversation state can quietly retain secrets long after the original task is complete. The Anthropic report on the first AI-orchestrated cyber espionage campaign report underscores how quickly autonomous workflows can chain actions once a single control fails. Security teams should treat these environments as dynamic trust zones, not static application tiers. In practice, leakage persists where connectors are over-permissioned and output review is the only line of defense.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Addresses prompt injection and unsafe agent actions that can expose sensitive data.
CSA MAESTRO	T-2	Covers runtime trust decisions for agentic workflows and data exposure paths.
NIST AI RMF		Supports governance and measurement of AI-related privacy and leakage risk.

Establish AI risk controls that verify data access, monitoring, and incident response for GenAI systems.

How should security teams stop GenAI systems from leaking sensitive data?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group