The system prompt is the core instruction set that defines an LLM’s behaviour, boundaries, and response style during a session. When attackers influence or override it, they are not merely changing text. They are trying to change the model’s governing authority.
Expanded Definition
The system prompt is the governing instruction layer for an LLM or AI agent, shaping behaviour, tone, safety boundaries, tool use, and refusal logic across a session. In agentic systems, it can function as a policy anchor rather than just a hidden prompt. That distinction matters because the prompt may direct the model to protect secrets, constrain action scope, or prioritise certain sources when executing tasks. Definitions vary across vendors, and no single standard governs this yet, so NHI teams should treat the system prompt as security-relevant configuration, not casual text. This aligns with the governance emphasis in the NIST Cybersecurity Framework 2.0, where protecting the integrity of control data is part of resilient operations. The most common misapplication is treating the system prompt as a harmless product copy field, which occurs when teams let multiple tools edit it without version control or approval.
Examples and Use Cases
Implementing system prompts rigorously often introduces a tradeoff between tighter behavioural control and faster iteration, requiring organisations to weigh model consistency against experimentation speed.
- An AI support agent uses the system prompt to refuse requests for API keys and route users to approved verification steps, reducing accidental disclosure risk.
- A code-generation assistant is instructed to avoid executing tool calls unless a human approves the action, limiting unsafe autonomous behaviour.
- A finance workflow agent receives a prompt that constrains it to approved ledger actions only, with any exception logged for review; this mirrors the control mindset described in the Ultimate Guide to NHIs.
- An enterprise search assistant is told to ignore user-supplied instructions that conflict with policy, a pattern often discussed alongside prompt injection defenses in the NIST Cybersecurity Framework 2.0.
- A customer-facing agent receives separate prompts for style, safety, and escalation so security teams can patch one layer without rewriting the entire application.
Why It Matters in NHI Security
System prompt compromise is an NHI security issue because it can change how an AI agent interprets authority, handles secrets, and decides whether to act. When attackers inject or override instructions, they may cause the model to reveal sensitive data, ignore guardrails, or misuse connected tools that already have execution authority. That makes the prompt part of the effective control plane for agent identity and privilege. The risk is amplified in environments where NHIs already suffer from weak governance: NHI Mgmt Group reports that only 5.7% of organisations have full visibility into their service accounts, which means prompt-driven agents may inherit opaque access patterns before security teams notice. The right mental model is that prompt integrity supports least privilege, while prompt drift expands it. Organisations typically encounter the operational impact only after an agent has leaked data, executed an unauthorised tool action, or followed a malicious instruction, at which point system prompt governance becomes unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Prompt injection and instruction hijacking are core agentic AI threats. |
| NIST CSF 2.0 | PR.DS | Prompt integrity protects sensitive instructions and control data. |
| NIST AI RMF | AI governance requires controls for manipulation, misuse, and unsafe outputs. |
Lock system prompts, separate trusted instructions, and block user content from overriding policy.