Leakage control is the set of checks used to prevent models from exposing secrets, credentials, or personal data. In practice, it requires both prompt-side and output-side enforcement, because unsafe content can be introduced before generation and disclosed after generation.
Expanded Definition
Leakage control is the discipline of stopping models, copilots, and agent workflows from revealing secrets, credentials, or personal data in prompts, intermediate reasoning, tool calls, or final outputs. In NHI security, it sits at the boundary between data protection and identity protection because leaked API keys, tokens, and certificates often become direct paths to privileged access. The most useful way to understand it is as layered enforcement: prompt-side controls reduce the chance that unsafe material enters the model context, while output-side controls block disclosure before it reaches a user or downstream system.
Definitions vary across vendors when the term is applied to retrieval pipelines, chat interfaces, and autonomous agents. Some use it narrowly to mean redaction, while others include policy checks, content classifiers, allowlists, and post-generation filtering. NHI Management Group treats leakage control as a governance capability, not a single feature, because models can disclose sensitive material through normal completions, tool outputs, or indirect prompt injection. For related threat framing, see the NIST AI Risk Management Framework and NHIMG’s Guide to the Secret Sprawl Challenge.
The most common misapplication is treating leakage control as a chat filter only, which occurs when organisations ignore embedded secrets in context windows, logs, and tool responses.
Examples and Use Cases
Implementing leakage control rigorously often introduces latency and false-positive tuning effort, requiring organisations to weigh stronger prevention against user friction and operational overhead.
- Blocking a service account token when a prompt asks the model to “repeat the full deployment payload,” because the request may be a disguised exfiltration attempt.
- Redacting personal data before retrieval-augmented generation returns a customer record summary to a support agent.
- Filtering tool output from an agent that queries a secrets store, so the model can use the secret without ever displaying it to the caller.
- Preventing accidental disclosure in logs by masking credentials that appear in prompts, embeddings, or prompt traces used for debugging.
- Applying policy checks to code-generation workflows so the model does not echo API keys or certificates from an index, config file, or training artifact.
Leakage control becomes more practical when paired with identity hygiene and secret discovery. NHIMG research shows that 96% of organisations store secrets outside of secrets managers in vulnerable locations, which makes prompt exposure much more likely, and the Ultimate Guide to NHIs — Why NHI Security Matters Now and 52 NHI Breaches Analysis show how exposed NHI credentials repeatedly become incident paths. For implementation patterns, the NIST AI Risk Management Framework supports mapping sensitive-content controls to risk treatment.
Why It Matters in NHI Security
Leakage control matters because a single exposed token can collapse multiple layers of defense at once. When a model reveals a credential, the result is not merely a privacy issue; it can become unauthorised access to cloud services, CI/CD pipelines, SaaS tenants, or internal data stores. That makes leakage control central to privileged access containment, incident response, and least-privilege design. It also supports zero trust by reducing what an agent is allowed to see and what it is allowed to repeat.
NHIMG research underscores the operational stakes: 79% of organisations have experienced secrets leaks, and 77% of those incidents resulted in tangible damage, while 91.6% of leaked secrets remain valid five days after notification. In practice, that means disclosure is often exploitable long after the first alert, especially when rotation and offboarding are weak. The Ultimate Guide to NHIs — Standards aligns this concern with governance expectations, and Anthropic — first AI-orchestrated cyber espionage campaign report illustrates how model-mediated workflows can be abused when controls are weak.
Organisations typically encounter leakage control as a priority only after a secret has been exposed in a prompt log or an agent has echoed protected data into an incident, at which point the term becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Non-Human Identity Top 10 | NHI-02 | Secret exposure and improper handling are core NHI leakage risks. |
| OWASP Agentic AI Top 10 | A1 | Prompt injection and unsafe output pathways drive leakage in agentic systems. |
| NIST AI RMF | Risk governance covers harmful disclosure from AI systems and workflows. |
Detect, restrict, and rotate exposed secrets before they can be replayed by models or agents.