Subscribe to the Non-Human & AI Identity Journal

Prompt-level data leakage

Prompt-level data leakage happens when sensitive information is pasted into an AI prompt and leaves the organisation’s secure environment. The control failure is not just application access, but the lack of rules and enforcement around what data can be submitted in the first place.

Expanded Definition

Prompt-level data leakage is a boundary control problem: the information leaves the organisation at the moment a user, analyst, or operator pastes it into an AI prompt. In NHI and Agentic AI environments, the issue is not only whether the model can access a system, but whether the prompt itself is allowed to contain secrets, customer data, source code, or regulated records.

Definitions vary across vendors, because some tools treat this as a content moderation issue while others frame it as data loss prevention, policy enforcement, or prompt firewalling. The practical distinction is that prompt-level leakage occurs before retrieval, inference, or tool execution, so it cannot be remediated by permissions alone. NIST’s guidance on secure AI risk handling in NIST AI Risk Management Framework is useful here because it emphasises governability, traceability, and risk treatment around AI inputs.

For NHI programmes, this sits alongside secret handling, role design, and approved data-use policy. The most common misapplication is assuming an enterprise AI chat interface is safe because access is authenticated, which occurs when organisations fail to inspect the content users are permitted to submit.

Examples and Use Cases

Implementing prompt-level controls rigorously often introduces friction for legitimate work, requiring organisations to weigh faster AI assistance against stronger data handling discipline.

  • A developer pastes a live API key into a coding assistant prompt to debug an integration, creating immediate exposure outside the secure environment. That pattern echoes the broader secret-sprawl concerns discussed in Guide to the Secret Sprawl Challenge.
  • A support analyst includes a customer contract and account identifiers in a prompt to summarise a case, sending regulated data to an external model endpoint. This should be governed with content rules, not just account-based access.
  • A security engineer asks an agent to review a configuration file and accidentally includes embedded certificates and tokens, which can then be retained in logs, traces, or vendor telemetry. The Anthropic report on first AI-orchestrated cyber espionage campaign report is a reminder that prompt content can become an operational attack surface.
  • An analyst uses an internal chatbot for incident triage and pastes memory dumps or log excerpts containing session tokens, unintentionally broadening exposure beyond the original system boundary.

These examples are most relevant when prompts are accepted from browsers, IDE extensions, ticketing tools, or agent consoles without inline policy checks.

Why It Matters in NHI Security

Prompt-level leakage is dangerous because it converts ordinary user convenience into an exfiltration path for secrets, service-account material, and sensitive operational context. Once a prompt leaves the organisation, downstream controls often lose visibility, especially when the same input is stored in vendor logs, reused for model improvement, or forwarded into agent workflows. That is why NHI governance must address what is allowed to be submitted, not only what is allowed to be accessed.

NHIMG research shows the scale of the problem: 79% of organisations have experienced secrets leaks, and 77% of those incidents resulted in tangible damage, as reported in Ultimate Guide to NHIs — Key Research and Survey Results. The same research also shows that 96% of organisations store secrets outside of secrets managers in vulnerable locations, which makes accidental prompt disclosure even more likely when staff copy data from code, configs, or CI/CD tools. A useful companion perspective is the Ultimate Guide to NHIs — Why NHI Security Matters Now, which frames how broad NHI exposure compounds identity risk.

Organisations typically encounter the operational cost only after a leaked token, credential, or regulated record is discovered in model logs or an external support case, at which point prompt-level controls become operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Non-Human Identity Top 10 NHI-02 Covers secret exposure and improper handling of credentials in NHI workflows.
NIST AI RMF Treats AI input governance as part of measurable risk management and oversight.
NIST CSF 2.0 PR.DS-1 Addresses protection of data at rest and in transit, including AI input exposure paths.

Classify prompt data and prevent sensitive content from leaving approved boundaries.