TL;DR: Prompt injection can override weak guardrails, while well-scoped system instructions, runtime prompt security, and user prompt hygiene reduce unsafe or misleading model behaviour, according to Noma Security. The practical question is no longer whether to use layered controls, but how to govern them as enforceable policy across AI deployments.
At a glance
What this is: This is an explanatory post on the security difference between system instructions and prompt engineering, and why both must be governed as layered controls.
Why it matters: For IAM and NHI practitioners, it shows how AI access paths need policy, logging, and least privilege just like any other non-human identity surface.
👉 Read Noma Security's analysis of prompt engineering and AI security controls
Context
AI prompt security is not a single control problem. System instructions shape the model’s baseline behaviour, while user prompts can still introduce unsafe requests, override attempts, or data-exfiltration pressure if runtime controls are weak. That makes the issue directly relevant to NHI governance, because AI agents and model interfaces behave like identities with scoped access, not just chat interfaces.
The governance gap is familiar to IAM teams: policy exists at design time, but enforcement must hold at runtime. When prompts can trigger code execution, database queries, or tool access, the control question becomes who or what is authorised to do that action, under what conditions, and with what monitoring. For practitioners, the article’s starting point is typical of real-world AI deployments: the technical controls are often present in fragments, but the governance model is still immature.
Key questions
Q: How should security teams handle prompt injection in AI systems?
A: Treat prompt injection as an authorisation problem, not only a content problem. Validate user input, monitor for override patterns, and block any prompt that tries to change model policy, exfiltrate data, or trigger sensitive actions. The safest design is one where untrusted prompts cannot directly reach code execution, production data, or privileged tools without policy checks.
Q: What is the difference between system instructions and user prompts in AI security?
A: System instructions are persistent policy rules that define how the model should behave across sessions. User prompts are the task requests entered at runtime. Security teams should govern the first as controlled policy and treat the second as untrusted input that can still try to bypass guardrails or steer the model into unsafe actions.
Q: When do AI agents create NHI risk?
A: AI agents become NHI risk when they can act with delegated authority, such as calling APIs, querying data, or executing workflows. At that point, they need the same discipline used for service accounts and tokens: least privilege, audit logging, scoped permissions, and periodic review of what they can actually do.
Q: How can organisations reduce unsafe AI outputs without over-restricting users?
A: Use layered controls. Set safe system instructions, enforce runtime validation, and give users approved prompt templates that reflect policy constraints. That approach reduces the chance of harmful or inaccurate outputs without relying on users to write perfect prompts every time.
Technical breakdown
System instructions as persistent policy
System instructions are the durable rules a model follows across sessions. In security terms, they function like baseline policy, defining what the model should refuse, what it should escalate, and which tools or topics are out of scope. The important point is that these instructions are not cosmetic. If they are vague, inconsistent, or editable without controls, they become a governance weakness rather than a protection layer. Treat them as versioned policy objects with clear ownership, review, and rollback. That makes them closer to IAM policy than to ordinary prompt text.
Practical implication: version, review, and audit system instructions as governed policy artifacts.
Prompt injection and runtime prompt security
Prompt injection works when a malicious or malformed user prompt attempts to override the model’s intended behaviour. The risk increases when the model can call tools, fetch data, or take action based on untrusted input. Runtime prompt security therefore needs sanitisation, pattern detection, and policy enforcement in front of model actions. This is not just content moderation. It is access control for AI behaviour. If a prompt can reach production systems, the environment should treat that prompt as an untrusted request that must be validated before execution.
Practical implication: enforce input validation and action gates before prompts can trigger sensitive tool use.
Scoped actions for AI agents and least privilege
When AI systems can run code, query databases, or interact with business tools, they behave like agents with non-human identities. That means least privilege must apply at the action level, not only at the user level. A prompt should not automatically inherit broad environmental access. Instead, capabilities should be bound to the task, the role, and the approved context. This is where NHI governance and agentic AI security intersect: the model may generate the request, but the environment still decides whether the action is authorised. That distinction is what keeps AI from becoming an uncontrolled privilege amplifier.
Practical implication: bind AI tool permissions to task-scoped policy, not to the broad model identity.
Threat narrative
Attacker objective: The attacker aims to turn the AI interface into an execution path for data exposure or unsafe operational actions.
- Entry occurs through a malicious or carefully crafted prompt that attempts to override guardrails or induce unsafe behaviour.
- Escalation happens when the model has unchecked access to tools, code execution, or data sources that extend beyond the request’s intended scope.
- Impact is unauthorised disclosure, unsafe execution, or misuse of privileged AI-connected systems.
- Attacker objective is to make the model act outside policy so it reveals sensitive information or performs unintended actions.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
System instructions are policy, not copy. AI governance fails when teams treat instructions as text snippets instead of enforceable controls. Persistent instructions should define refusal rules, escalation paths, and tool boundaries, then be versioned and audited like any other policy object. Practitioners should manage them as part of identity governance, because model behaviour is now part of the access surface.
Prompt injection is an authorisation problem disguised as a language problem. The core failure is not that a model can be persuaded to say something, but that a model can be persuaded to take actions it should not take. That shifts the control objective from content filtering to policy enforcement at the point of execution. Practitioners should place decision gates between user input and any sensitive model action.
AI agents create a new class of non-human identity risk. Once a model can call tools, query systems, or trigger workflows, it behaves like an identity with delegated authority. That authority can be over-broad, poorly logged, or difficult to review if organisations copy human-centric IAM patterns without adaptation. Practitioners should classify agent access as NHI access and govern it accordingly.
Runtime controls matter more than perfect prompts. The post is right to emphasise prompt hygiene, but no prompt discipline can compensate for excessive privileges or missing monitoring. Security teams need layered control because the model is non-deterministic and the surrounding system is the real enforcement point. Practitioners should assume prompts will fail and build controls that still hold.
From our research:
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
- A separate finding from the same research shows that only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs.
- For a broader control lens, compare that visibility gap with NIST Cybersecurity Framework 2.0 and map prompt security to detect and protect functions.
What this signals
Prompt security will increasingly be managed as an access-control problem. As AI systems take on more operational work, the distinction between a malicious prompt and an unauthorised action will continue to blur. Teams should prepare to treat prompt pathways as governed access paths, with logging, policy checks, and reviewable exception handling.
Ephemeral AI behaviour creates ephemeral trust debt. The more dynamic the model interaction, the easier it is to assume a prompt was safe because it succeeded once. That assumption will not scale. Security programmes should build controls that do not depend on the last prompt looking benign, especially when tools or data are in reach.
With 85% of organisations lacking full visibility into third-party vendors connected via OAuth apps, the same visibility problem is now being repeated in AI integrations. The reader’s programme should assume that model-connected tools will expand faster than governance unless access review and logging are designed in early.
For practitioners
- Version and approve system instructions Store system instructions as controlled policy artefacts with change approval, rollback, and audit logging. Review them for unsafe defaults such as broad disclosure, unbounded tool use, or ambiguous escalation behaviour.
- Gate model actions behind runtime policy Place validation and decision controls between user prompts and any action that can query production data, execute code, or modify systems. Treat every such request as an untrusted input requiring explicit authorisation.
- Apply least privilege to AI-connected tools Scope model tool access to the minimum task context, then separate read, write, and execution permissions. Reassess whether the model identity can reach production databases or privileged APIs at all.
- Monitor for prompt injection patterns Log and alert on override phrases, exfiltration attempts, repeated guardrail-hopping, and anomalous prompt sequences. Combine detection with block or step-up review for high-risk actions.
- Publish approved prompt templates Create a library of sanctioned prompts for common workflows, with explicit scope and data-handling language. This reduces user error while also making monitoring easier because safe patterns are known in advance.
Key takeaways
- AI prompt engineering becomes a security control when system instructions and runtime policy are treated as enforceable governance, not text guidance.
- Prompt injection matters because it can turn untrusted language into unauthorised tool use, data access, or unsafe execution.
- The practical response is layered control: versioned instructions, runtime validation, least privilege, and monitoring for anomalous prompt behaviour.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Prompt injection and tool misuse are central agentic AI risks in this post. | |
| NIST CSF 2.0 | PR.AC-4 | Least privilege and access validation apply directly to AI-connected actions. |
| NIST Zero Trust (SP 800-207) | Continuous verification fits runtime prompt validation and tool gating. |
Bind model actions to least-privilege controls and review them as part of access governance.
Key terms
- System Instructions: Persistent directives that shape how an AI model should behave across sessions and tasks. In security practice, they act like baseline policy, defining boundaries, refusals, escalation rules, and tool restrictions. They should be versioned, reviewed, and audited because they influence every downstream interaction.
- Prompt Injection: A technique where an attacker uses crafted input to steer a model away from its intended instructions. The goal is often to bypass guardrails, expose data, or make the model take an action it should not take. It is best treated as an authorisation and validation failure, not only a content issue.
- Non-Human Identity: A non-human identity is any machine- or software-based identity that can authenticate and act in a system, including service accounts, tokens, certificates, bots, workloads, and AI agents. These identities need governance because they can hold privilege, move data, and trigger actions without a person directly present.
- Prompt Security: Prompt security is the set of controls that protect AI interactions from malicious, malformed, or overbroad requests. It includes sanitisation, policy checks, anomaly detection, and action gating. The goal is to stop unsafe prompts from becoming unsafe model behaviour or privileged system actions.
Deepen your knowledge
AI prompt engineering security and system instruction governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for AI agents and model-connected tools, it is worth exploring.
This post draws on content published by Noma Security: That’s a Great Question! prompt engineering and system instructions in AI security. Read the original.
Published by the NHIMG editorial team on 2025-10-17.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org