What Is Prompt Leakage? Definition & Examples

The unintended exposure of user prompts, system prompts, or tool output from an AI runtime. In NHI terms, prompt leakage matters because those strings often carry sensitive instructions, credentials, or business context, and they may be stored in memory, logs, or exported artifacts.

Expanded Definition

Prompt leakage is not just a chat transcript problem. In NHI security, it includes any unintended exposure of user prompts, system prompts, tool instructions, retrieval context, and generated output that reveals sensitive operating details. The risk becomes sharper when an agent, plugin, or workflow engine stores those strings in logs, caches, analytics pipelines, or exported artifacts. Industry usage is still evolving, but the boundary is clear: if the prompt text exposes credentials, business logic, or hidden instructions, it becomes security-relevant content.

For governance teams, the issue sits at the intersection of AI runtime security, secrets handling, and access control. The same prompt that looks harmless in a test console can become an incident when it includes API keys, tenant names, internal URLs, or policy exceptions. That is why prompt leakage is often discussed alongside Ultimate Guide to NHIs — Why NHI Security Matters Now and external guidance such as Anthropic — first AI-orchestrated cyber espionage campaign report, both of which show how AI systems can amplify exposure when inputs and outputs are not tightly controlled.

The most common misapplication is treating prompt text as disposable application telemetry, which occurs when teams log full prompts without redaction or retention limits.

Examples and Use Cases

Implementing prompt leakage controls rigorously often introduces observability tradeoffs, requiring organisations to weigh debugging value against the cost of exposing sensitive context.

A support bot echoes a system prompt that contains escalation rules and internal routing logic, giving attackers a blueprint for abuse.
An agent workflow writes full tool calls into logs, leaking tokens or signed URLs that should have been masked before export.
A retrieval-augmented assistant returns hidden source snippets from a restricted knowledge base, exposing material that was meant only for the model runtime.
A developer copies a prompt template into a ticket or repository, and the file later appears in a broad search index or shared workspace.
An AI assistant used for incident response mirrors live context from a case note, accidentally disclosing customer data to users without clearance.

These failures are easier to understand when paired with NHI risk patterns documented in The 52 NHI breaches Report and the Guide to the Secret Sprawl Challenge. Prompt leakage often starts as a convenience decision, then spreads as prompts are reused across environments. Definitions vary across vendors, but the practical pattern is consistent: if the prompt is reusable, it is also potentially exfiltratable. That is why teams should pair prompt hygiene with controls described in the same Anthropic report, especially where autonomous agents can chain tools, memory, and network access.

Why It Matters in NHI Security

Prompt leakage matters because it turns AI runtime content into an attack surface. Once sensitive prompts or outputs escape the intended boundary, attackers can learn how an agent is instructed, what data it can reach, and which guardrails are missing. In NHI environments, that can expose service account names, secrets references, internal APIs, and operational exceptions that make lateral movement easier. NHIMG research shows the scale of the problem: Ultimate Guide to NHIs — Why NHI Security Matters Now reports that 79% of organisations have experienced secrets leaks, and 77% of those incidents caused tangible damage.

That is why prompt leakage is not a purely model-safety concern. It is a governance and remediation issue that overlaps with secrets management, RBAC, retention, redaction, and agent permission design. The same exposure can also reveal where an organisation stores context, how often it reuses prompts, and whether JIT controls exist for privileged tool access. The lesson from 52 NHI Breaches Analysis is that weak identity hygiene rarely stays contained once content escapes the runtime. Organisations typically encounter prompt leakage only after a transcript, log export, or shared screenshot appears in the wrong hands, at which point the issue becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A3	Prompt leakage often exposes agent instructions and tool context that OWASP flags as attack surface.
OWASP Non-Human Identity Top 10	NHI-02	Leaked prompts frequently reveal secrets and privileged NHI context governed by secret management controls.
NIST AI RMF	GV.1-3	AI governance requires managing harmful disclosure risks from prompts, outputs, and retained context.

Minimise exposed prompt content and restrict agent outputs to the least sensitive data needed.

Prompt Leakage

Expanded Definition

Examples and Use Cases

Why It Matters in NHI Security

Standards & Framework Alignment

Related resources from NHI Mgmt Group