How should security teams handle system prompts that may contain sensitive data?

Why This Matters for Security Teams

System prompts are often treated as harmless instruction text, but in practice they can contain routing logic, policy exceptions, tool names, and occasionally embedded secrets that should never have been placed there in the first place. Once that content exists in a prompt, it becomes harder to govern than a standard configuration file because it can influence model behaviour, be copied into logs, or be surfaced through model output. Security teams should treat prompt content as sensitive control logic, not as a storage layer.

That distinction matters because prompt leakage is not just an information disclosure issue. It can expose hidden decision paths, weaken guardrails, and reveal how an agent or application escalates requests to downstream systems. NHI management guidance from Ultimate Guide to NHIs — Key Research and Survey Results reinforces the broader point that weak visibility and inconsistent governance are common failure modes across machine identities, and the same pattern applies to prompt handling. The right control model is to separate behaviour prompts from governed secrets management and access policy. Current guidance also aligns with the NIST Cybersecurity Framework 2.0 emphasis on controlled access, data governance, and continuous monitoring.

In practice, many security teams discover prompt leakage only after a model response, log export, or support escalation has already exposed the hidden content.

How It Works in Practice

The practical model is simple: prompts should describe intent, not hold secrets. Sensitive values such as API keys, bearer tokens, certificate material, internal thresholds, or privileged routing instructions belong in governed external systems with strong access controls, auditing, and rotation. The prompt can reference those controls indirectly, but it should not embed them.

For AI-enabled applications, that usually means splitting the design into three layers. First, the system prompt contains only the behavioural contract for the model. Second, policy and routing decisions are handled by an external control plane, such as policy-as-code or an orchestration service. Third, any credential material is injected at runtime from a secrets manager or workload identity provider, then removed after use. This reduces the blast radius if the prompt is logged, cached, or inferred through output. It also supports the principle behind DeepSeek breach-style incidents, where hidden operational detail can become part of the attack surface once it is accessible to the model or surrounding workflow.

Keep system prompts free of credentials, internal URLs, and exception handling logic.

Use external secrets managers for tokens, keys, certificates, and per-session material.

Separate prompt text from policy enforcement so access decisions happen outside the model.

Log prompt changes through change control, review, and approval workflows.

Assume prompts may be reconstructed from outputs, traces, or downstream tool calls.

When teams need stronger assurance, they should apply the same review discipline used for privileged code paths: classify prompt content, approve changes, and test for data leakage during red-team exercises. The emerging best practice is to treat prompts as governed configuration with security review, not as a casual place to store operational shortcuts. These controls tend to break down in fast-moving agent pipelines with multiple tool hops and shared template inheritance because sensitive text gets propagated into places the original author did not intend.

Common Variations and Edge Cases

Tighter prompt controls often increase operational overhead, requiring organisations to balance developer convenience against exposure reduction. That tradeoff becomes sharper in environments where prompts are assembled dynamically, reused across products, or customised by customer-specific instructions.

There is no universal standard for this yet, but current guidance suggests a few consistent patterns. If a prompt includes temporary context that is not a secret but is still sensitive, keep the retention window short and avoid broad distribution into logs or analytics. If a prompt must contain routing logic, move that logic into a rules engine or policy service so it can be versioned and audited independently. If a prompt is used by an autonomous agent, assume the model may combine it with tool access in ways that were not anticipated during design. That is why prompt secrecy should be handled alongside governance disciplines from NIST Cybersecurity Framework 2.0 rather than as an isolated AI concern.

NHIMG research also shows why the control problem matters operationally: in The State of Non-Human Identity Security, only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, which reflects a broader visibility gap that often extends to prompts, agents, and machine-driven workflows. The practical takeaway is to keep prompts lean, externalise sensitive control data, and review them with the same care applied to secrets and privileged access.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Prompt-embedded secrets create the same exposure risk as weak NHI secret handling.
OWASP Agentic AI Top 10	A2	Agent prompts can leak control logic and tool paths into model-visible text.
NIST CSF 2.0	PR.DS-1	Sensitive prompt content is data that needs controlled protection and handling.

Remove secrets from prompts and manage them in governed stores with rotation and audit.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams handle system prompts that may contain sensitive data?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group