Prompt text is editable input, not a credential or a signed delegation. If security teams treat instructions in the prompt as proof of authority, prompt injection can turn untrusted content into real actions. The result is confused-deputy behaviour, where a legitimate agent uses valid permissions on behalf of the attacker.
Why This Matters for Security Teams
Prompt text is often treated as if it were a trusted control plane, but it is only input. Once teams let instructions inside a prompt stand in for identity or authority, they create a direct path from untrusted text to privileged action. That is how prompt injection becomes a security issue rather than a content-quality issue. The risk is familiar to NHI teams: confused-deputy behaviour, over-scoped execution, and weak separation between instruction and entitlement.
This is especially dangerous in agentic workflows where an AI agent can call tools, chain tasks, and reuse existing access. The problem is not merely that the model can be manipulated. The bigger issue is that downstream systems may accept the manipulated output as if it were an authenticated delegation. NHI Management Group’s Ultimate Guide to NHIs shows how common identity weaknesses already are, and the NIST Cybersecurity Framework 2.0 reinforces the need to separate identity, access, and verification. In practice, many security teams discover prompt-as-identity failures only after an agent has already used legitimate permissions to do the wrong thing.
How It Works in Practice
Correct design starts with a simple rule: prompt text can influence behaviour, but it cannot confer authority. Identity should come from a workload or agent identity primitive, not from words embedded in a user prompt, web page, document, or retrieved context. For autonomous systems, that usually means cryptographic workload identity, short-lived credentials, and policy decisions made at request time.
Current guidance suggests separating three layers:
-
Instruction layer: natural-language prompts, system messages, and retrieved content, all treated as untrusted input.
-
Identity layer: the cryptographic identity of the agent, service account, or workload, issued and verified independently of the prompt.
-
Authorisation layer: runtime policy that decides whether a specific action is allowed in the current context.
That model is consistent with the direction of the NIST Cybersecurity Framework 2.0 and with the broader non-human identity lifecycle described in Ultimate Guide to NHIs. In agentic environments, policy checks should happen after the agent decides what it wants to do, not before based on a static role assumption. That is why best practice is evolving toward intent-aware controls, short-lived tokens, and explicit approval gates for high-risk actions. NHI Management Group’s Top 10 NHI Issues also reflects how excessive privilege and poor visibility magnify the impact when a prompt is mistaken for proof.
In practical terms, teams should ensure that prompt content cannot write directly to privileged APIs, cannot mint credentials, and cannot change policy state without an independent trust decision. These controls tend to break down when the agent can reach multiple tools in a single workflow and the platform reuses a long-lived token across steps, because a single injected instruction can then steer a valid identity into unauthorized actions.
Common Variations and Edge Cases
Tighter separation between prompt and identity often increases implementation overhead, requiring organisations to balance safety against latency, developer convenience, and workflow complexity. There is no universal standard for every agent stack yet, so the exact control design depends on whether the system is a chat assistant, a tool-using agent, or a multi-agent pipeline.
One common edge case is retrieval-augmented generation. Retrieved text may be authoritative for facts, but it is still untrusted for instructions. Another is human-in-the-loop approval, where a reviewer may approve an action described in a prompt, but the approval must attach to a specific operation, not to the text that described it. A third edge case is shared service accounts: if one identity can act on behalf of many prompts or users, prompt injection becomes much easier to weaponise.
For that reason, current guidance suggests treating prompt text as evidence, not authority. Security teams should pair content filtering with workload identity, least privilege, and runtime enforcement such as policy-as-code. Where this guidance is weakest is in legacy orchestration platforms that expose broad, reusable credentials to the agent runtime, because the platform itself becomes the deputy that attackers can steer through injected text.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A02 | Prompt injection and tool abuse are central to this identity confusion risk. |
| CSA MAESTRO | MAESTRO covers agent trust boundaries, authorization, and control-plane separation. | |
| NIST AI RMF | AI RMF addresses governance and trustworthiness for autonomous AI behaviour. |
Separate agent instructions from authority and enforce independent checks before execution.
Related resources from NHI Mgmt Group
- What breaks when zero-days are treated as a patching issue instead of an identity issue?
- What breaks when identity monitoring is treated as a generic alert problem?
- What is the difference between prompt injection risk and identity abuse in agents?
- What breaks when managed DNS is treated as a pure uptime tool?