How should security teams prevent sensitive data from leaking through AI prompts and copilots?

Why This Matters for Security Teams

AI prompts and copilots turn sensitive data into an interaction problem, not just a storage problem. Once users can paste customer records, source code, or credentials into a chat interface, traditional boundary controls lose visibility unless they inspect the request in motion. That is why guidance is shifting toward prompt-aware classification, entitlement review, and runtime enforcement across the AI path, not only the file system behind it.

The risk is not hypothetical. NHIMG has documented how exposed secrets and AI-related mishandling compound quickly, including its Guide to the Secret Sprawl Challenge and the DeepSeek breach, both of which show how quickly sensitive material can spread once it enters uncontrolled systems. External threat research from Anthropic on AI-orchestrated cyber espionage also reinforces that copilots and agents can be manipulated into moving data far beyond the original intent.

In practice, many security teams encounter data leakage only after a user has already pasted regulated or proprietary content into an AI tool, rather than through intentional policy design.

How It Works in Practice

Preventing leakage requires controls that follow the prompt lifecycle. Start with discovery so teams know which copilots, plugins, APIs, and internal assistants exist. Then classify the data that may flow into them, including source code, secrets, contracts, regulated customer data, and operational logs. Current guidance suggests using intent-aware controls that evaluate what the user is trying to do at request time, because a prompt can be safe in one context and dangerous in another.

In mature environments, the AI interaction is routed through policy enforcement that can inspect both prompt and response, redact fields, block risky queries, or downgrade the model’s access to approved context only. That is the practical difference between legacy DLP and AI-aware enforcement. The control plane should also verify entitlement before allowing retrieval from connected systems, since a copilot with broad connector access can leak more through retrieval than through the prompt itself. NHIMG’s 52 NHI Breaches Analysis is a useful reminder that identity misuse is often the real breach path, not the model prompt alone.

Classify prompts and responses for regulated content, secrets, and proprietary data.

Apply least privilege to model connectors, retrieval tools, and downstream APIs.

Use managed identities or workload identities for copilots instead of shared secrets.

Log prompt lineage so security teams can trace what was entered, retrieved, and returned.

Block or scrub high-risk outputs before they reach users or external systems.

For identity-driven enforcement patterns, current implementation guidance is converging with SPIFFE for workload identity and NIST SP 800-207 Zero Trust Architecture for continuous, request-time decision making. These controls tend to break down when copilots are embedded in unmanaged SaaS tools because policy inspection and connector governance are no longer under the organisation’s control.

Common Variations and Edge Cases

Tighter prompt controls often increase friction, so organisations have to balance leakage prevention against developer and analyst productivity. The tradeoff is most visible when teams want strong inspection but also expect free-form natural language, broad retrieval, and low-latency responses. Best practice is evolving, and there is no universal standard for how much prompt content should be stored, redacted, or retained.

One common edge case is internal copilots that are “safe” on paper because they sit behind SSO, yet still pull from overly broad knowledge stores or service accounts with standing access. Another is the use of external AI services where data residency, retention, and training usage are not fully transparent. Security teams should also assume that users will paste secrets unless guidance and guardrails are explicit, since the operational evidence from NHIMG’s Ultimate Guide to NHIs — Key Research and Survey Results and Ultimate Guide to NHIs — Why NHI Security Matters Now shows that secret sprawl remains persistent across environments.

One useful benchmark from NHIMG’s research is that the average time to mitigate a leaked secret is 36 hours, which means prevention matters more than cleanup after the fact. In environments with highly dynamic retrieval, such as coding copilots connected to multiple repositories, even well-designed controls can struggle because the risk changes with every query, permission check, and generated answer.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Prompt injection and data exfiltration are core agentic AI leakage risks.
CSA MAESTRO	M1	MAESTRO addresses agent identity, tool access, and safe execution boundaries.
NIST AI RMF	GOVERN	AI RMF governance covers oversight, accountability, and risk controls for AI data use.

Define ownership, logging, and review rules for prompt handling under your AI governance program.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams prevent sensitive data from leaking through AI prompts and copilots?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group