Subscribe to the Non-Human & AI Identity Journal

Prompt Security

Prompt security is the set of controls that protect AI interactions from malicious, malformed, or overbroad requests. It includes sanitisation, policy checks, anomaly detection, and action gating. The goal is to stop unsafe prompts from becoming unsafe model behaviour or privileged system actions.

Expanded Definition

Prompt security is the control layer that evaluates, constrains, and monitors inputs before they can influence an AI model or trigger downstream actions. In agentic systems, it is not just about text cleaning; it also covers policy enforcement, tool-use gating, and detection of prompt injection or jailbreak patterns. Guidance varies across vendors, but the practical definition is converging around one idea: untrusted instructions should never inherit privileged execution. That aligns with the broader discipline of identity and access governance described in the NIST Cybersecurity Framework 2.0, especially where access control, monitoring, and response overlap. In NHI-heavy environments, prompt security matters because prompts often arrive through agents, copilots, chat interfaces, API calls, or MCP-connected tools that can reach secrets or production systems. It is therefore a runtime safeguard, not a model-training feature. The most common misapplication is treating prompt security as simple content filtering, which occurs when teams block keywords but fail to gate tool calls, identity context, or high-risk actions.

Examples and Use Cases

Implementing prompt security rigorously often introduces latency and workflow friction, requiring organisations to weigh faster model responses against tighter control over action-bearing requests.

  • An internal support agent receives a request to reset credentials. Prompt security checks whether the request is allowed for that user, then blocks any attempt to expose secrets or bypass approval steps.
  • A developer uses an AI coding assistant. The system sanitises instructions that try to reveal environment variables, and it cross-checks any proposed changes against policy before they reach a pipeline.
  • An MCP-connected assistant tries to query a ticketing system and then send a message on behalf of a team. Prompt security limits tool scope so the agent can read context but not escalate into unauthorised execution.
  • A customer-facing chatbot is targeted with prompt injection designed to override guardrails. Detection logic flags the pattern, quarantines the session, and logs the event for review, consistent with Zero Trust thinking in NIST Cybersecurity Framework 2.0.
  • An organisation rolls out AI agents before defining tool permissions. The result is ad hoc prompt filtering, while the more durable fix is to anchor agent behaviour to identity, policy, and lifecycle controls discussed in the Ultimate Guide to NHIs.

Why It Matters in NHI Security

Prompt security becomes decisive when an AI system can act on behalf of a person, team, or service account. If the prompt layer is weak, a malicious instruction can become a privileged action, turning an ordinary conversation into a pathway for data exposure, ticket abuse, or secret retrieval. That is why prompt controls belong alongside NHI governance, not outside it. NHI programs already struggle with visibility, rotation, and over-privilege; the same control gaps can be amplified when an agent is allowed to call tools, read context, or invoke automation without strong gating. NHI research from Ultimate Guide to NHIs shows that 97% of NHIs carry excessive privileges, which makes any prompt-to-action path especially sensitive. In practice, prompt security is part of Zero Trust for agents: verify request intent, constrain execution, and record every decision. Organisations typically encounter the need for prompt security only after an agent leaks a secret, executes an unsafe tool call, or passes an injected request into production, at which point the control becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A01 Agentic apps must resist prompt injection and unsafe tool use.
NIST CSF 2.0 PR.AC-4 Prompt security supports access control by restricting who can invoke sensitive actions.
NIST Zero Trust (SP 800-207) SC-3 Zero Trust requires each request be evaluated before access or execution is granted.

Gate agent instructions and tools so untrusted prompts cannot trigger privileged actions.