By NHI Mgmt Group Editorial TeamPublished 2026-04-27Domain: Agentic AI & NHIsSource: Scramble ID

TL;DR: Prompt injection cannot be fully mitigated by better prompts because current LLMs cannot reliably separate instruction from data, so consequential authority must move behind cryptographic authorization boundaries, according to Scramble ID. Scope-per-tool tokens, step-up approval, dual control, and chain-aware delegation make harmful actions fail at the resource boundary instead of in the model.


At a glance

What this is: This is a structural defence analysis showing that prompt injection is an authorization problem, not a prompt-tuning problem.

Why it matters: It matters because IAM, PAM, and NHI teams need to place enforcement where the agent cannot override it, especially as autonomous and delegated AI workflows expand.

👉 Read Scramble ID's analysis of prompt injection defence through identity controls


Context

Prompt injection is an identity governance problem when an LLM or agent can reach tools, data, and actions that it should not be able to authorise on its own. The key failure is not that the model is clever enough to be tricked, but that it is allowed to sit too close to consequential authority.

The practical question for IAM and NHI programmes is where to place the control boundary. If the model can still make or influence the final decision, prompt filtering only reduces risk. If the decision sits behind scope, step-up, or dual control, compromised reasoning stops at the boundary.


Key questions

Q: How should security teams stop prompt injection from turning into tool misuse?

A: They should enforce authorization at the tool or resource boundary, not inside the model. Scope each tool to the minimum necessary permission, then deny any call that falls outside that scope. Prompt filtering still has value, but it cannot be the last line of defence because the model is exactly what the attacker is trying to influence.

Q: Why do prompt filters fail against indirect prompt injection?

A: Indirect injection hides malicious instructions inside content the agent is meant to process, such as documents, web pages, or emails. The model then sees both data and instruction in the same input stream and cannot reliably tell which is which. That is why the failure is structural, not just a matter of bad prompting.

Q: What breaks when an AI agent has more tool access than it needs?

A: Prompt injection can convert excess privilege into action. If an agent can read records, send messages, or change settings with the same identity, a malicious instruction may redirect that scope to harmful use. Least privilege only works when each tool is separately bounded and the resource server enforces those bounds.

Q: Who should approve high-risk actions taken by AI agents?

A: A human approver independent of the initiating path should approve high-risk actions such as payments, bulk deletions, or privilege grants. For the highest-risk actions, dual control is better than single review because it adds segregation of duties and makes silent misuse harder to complete.


Technical breakdown

Why prompt injection succeeds in shared input channels

LLMs process system instructions, user prompts, retrieved content, and embedded instructions through the same context window. That means the model has to infer intent from token patterns rather than from a trusted data-versus-instruction boundary. Direct injection arrives through the user channel, but indirect injection is harder because the malicious instruction is hidden inside content the agent legitimately retrieves. Once the model treats that content as actionable instruction, the downstream risk becomes an authorization failure, not a language failure.

Practical implication: move enforcement away from prompt content and into tool and resource authorization boundaries.

Scope-per-tool tokens at the MCP boundary

Scope-per-tool tokens bind each tool call to an explicit authorization claim. If the agent only has permission to summarise documents, an injected instruction to send email or read customer records fails because the MCP server or resource server checks token scope before executing the request. This is the cleanest way to break prompt-injection-driven scope escalation. The model may still produce the bad intent, but it cannot convert that intent into an out-of-scope action when the relying party enforces the claim cryptographically.

Practical implication: map every agent tool to the minimum possible scope and reject calls outside that scope at the server boundary.

Dual control and chain-aware delegation for consequential actions

High-impact actions need stronger controls than single-token scope checks. Dual control forces a second approved human ceremony before an irreversible action completes, while chain-aware delegation preserves attribution across multi-hop agent flows using token exchange claims such as subject_token and act. That prevents a compromised agent from silently extending authority through downstream delegation. The important architectural point is that the resource sees the delegation chain, not just the immediate caller, so policy can follow the original human intent and the full path of delegated authority.

Practical implication: require step-up or dual control for destructive, public, or financial actions and preserve delegation provenance end to end.



NHI Mgmt Group analysis

Prompt injection is an authorization problem disguised as a language problem. The model can be manipulated only because it is sitting too close to consequential power. Better prompts may reduce obvious failures, but they do not create a trustworthy instruction boundary. Practitioners should treat the model as an untrusted decision surface and move the real decision into cryptographic enforcement.

Scope-per-tool authorisation is the minimum viable control for AI agents. If an agent can summarise documents, that does not mean it can email data, modify records, or trigger payments. The security boundary has to live at the tool or resource server, where a token claim can be checked before the action executes. Anything looser leaves prompt injection with a usable path to abuse.

Chain-aware delegation is the governance layer most teams are missing. Once agents start calling other agents or services, the question is no longer only what the first agent may do. It is what each hop may delegate and how much authority survives the chain. Identity governance has to track the full delegation path, or scope creep will reappear as legitimate downstream execution.

Dual control belongs wherever prompt-injected misuse would be materially irreversible. The point is not to make every agent slower. The point is to separate low-risk automation from high-consequence action so that a compromised model cannot complete the most damaging steps alone. Practitioners should classify agent actions by consequence, then place cryptographic approval only where the blast radius justifies it.

Identity controls beat prompt controls because they change the failure mode. Prompt controls try to detect bad intent after it has already entered the model. Identity controls prevent the model from turning bad intent into authority in the first place. That is the sharper control model for NHI, agentic AI, and delegated human workflows alike.

From our research:

  • 72% of organisations have experienced or suspect they have experienced a breach of non-human identities, according to The 2024 ESG Report: Managing Non-Human Identities.
  • Two-thirds of enterprises have endured a successful cyberattack resulting from compromised non-human identities, with a quarter encountering multiple attacks, according to the same report.
  • That pattern makes Ultimate Guide to NHIs the right next step for teams mapping governance, lifecycle, and privilege boundaries.

What this signals

Prompt-injection defence now belongs in identity architecture reviews, not just AI safety reviews. If the model can still influence a tool call, the real control is missing. Teams should expect more scrutiny of scope design, delegation chains, and approval boundaries as agentic workflows move from pilots into production, especially where the action has financial, customer, or regulatory impact.

Identity teams should treat prompt injection as a standing test case for privilege design. The question is no longer whether a model can be tricked, but whether a compromised instruction can ever become an authorised action. That shifts programme focus toward MCP boundary enforcement, token exchange provenance, and cryptographic step-up for consequential operations.

Scope-per-tool tokens: this is the control pattern that keeps authority smaller than the model's reasoning. When every tool call carries its own claim and every high-impact action requires an external approval step, the agent can still fail safely instead of failing authoritatively.


For practitioners

  • Map every agent tool to a distinct scope Inventory each tool an agent can invoke, then assign the minimum scope needed for that one function. Reject any call that arrives with broader or mismatched claims, and treat insufficient-scope failures as expected security events rather than noise.
  • Place approval gates on irreversible actions Require human step-up or dual control before outbound payments, mass deletions, public posts, privilege grants, or security-policy changes can execute. Keep the approval context specific enough that a reviewer can understand what the agent is trying to do.
  • Preserve delegation provenance across agent chains Use chain-aware token exchange so each hop carries subject and actor context end to end. That lets downstream services evaluate what the original identity could legitimately delegate instead of trusting only the immediate caller.
  • Test prompt injection against real authorization paths Run red-team exercises that try to turn injected instructions into tool misuse, then verify where the boundary stops the action. Focus on MCP servers, API gateways, and resource servers, not just model output filters.

Key takeaways

  • Prompt injection is best understood as an authorization failure, because the model cannot reliably separate trusted instructions from malicious data once both share the same input path.
  • Scope-per-tool tokens, step-up approval, dual control, and chain-aware delegation are the control patterns that matter when AI agents can reach real systems.
  • Identity governance for agentic AI should focus on where authority is enforced, because that is what determines whether compromised reasoning becomes an actual incident.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10Prompt injection and tool misuse are core agentic AI threats.
OWASP Non-Human Identity Top 10NHI-01Scope and privilege controls govern non-human identities behind agent actions.
NIST Zero Trust (SP 800-207)PR.AC-4Zero Trust requires continuous authorization at the point of access.

Bind agent actions to least-privilege tools and externalize approval for consequential operations.


Key terms

  • Prompt Injection: An attack method that embeds malicious instructions into content an LLM or agent processes. The system then follows attacker-controlled directions instead of the intended task because the model cannot reliably distinguish instruction from data at the input layer.
  • Scope-Per-Tool Token: An access token that grants permission to one specific tool or action set rather than broad agent authority. It limits blast radius by making every call subject to server-side authorization checks that the model cannot override.
  • Dual Control: A governance pattern where a second authorised human must approve a high-impact action before it executes. In identity security, it is used to reduce the risk of irreversible actions being triggered by compromised automation or misleading agent output.
  • Chain-Aware Delegation: A delegation model that preserves the full identity path across multiple agent or service hops. Each hop carries provenance so downstream systems can judge what the original actor could legitimately delegate and where authority should stop.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Scramble ID: Prompt Injection Defense Through Identity Controls. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-27.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org