Indirect prompt injection changes the AI agent security problem

By NHI Mgmt Group Editorial TeamPublished 2025-09-29Domain: Agentic AI & NHIsSource: Noma Security

TL;DR: Noma Security’s analysis of ForcedLeak argues that indirect prompt injection against Salesforce Agentforce shows why agentic AI cannot be governed like deterministic software: the same input can produce different outputs, and those outputs can trigger actions across connected tools and data sources. The security problem is broader than XSS because the blast radius grows when agents are trusted to execute instructions.

At a glance

What this is: This analysis explains why indirect prompt injection is not equivalent to XSS and why non-deterministic LLM behaviour creates a different security and governance problem for agentic AI.

Why it matters: IAM and NHI teams need to treat AI agents as privileged actors whose inputs, outputs, and tool use require controls beyond classic web application validation.

👉 Read Noma Security's analysis of ForcedLeak and indirect prompt injection

Context

Indirect prompt injection is a content-to-action problem, not a browser scripting problem. In agentic AI systems, the model can turn untrusted text into instructions that downstream tools may execute, which makes NHI governance central to the risk rather than peripheral to it.

That matters because the same access patterns that were tolerable in traditional software become unstable when an AI agent can read mail, parse calendar data, or call external tools. The article’s core point is that security teams should not map prompt injection to XSS and assume the mitigation playbook transfers cleanly.

The distinction is especially relevant for organisations wiring large language models into workflows through agents, connectors, and tool servers. That starting position is increasingly common, not unusual, which is why the control gap is now a mainstream IAM concern.

Key questions

Q: How should security teams govern AI agents that can act on untrusted content?

A: Treat the agent as a privileged non-human identity and apply the same discipline you would use for any system that can execute on behalf of a user. Separate model output from execution, limit tool scope, require approvals for high-impact actions, and log the full decision path so you can reconstruct how a prompt became an action.

Q: Why is indirect prompt injection harder to defend than XSS?

A: XSS is usually defeated by deterministic controls such as sanitisation and output encoding. Indirect prompt injection is harder because the model interprets natural language non-deterministically and may turn untrusted text into different actions depending on context, so the security boundary must sit around the model, not inside it.

Q: What is the difference between prompt injection and traditional injection attacks?

A: Traditional injection attacks exploit a known parser or interpreter and often produce predictable results when the payload lands. Prompt injection exploits a model’s language interpretation and can steer downstream tools through trusted output, which makes the attack path less predictable and the blast radius much broader.

Q: When should organisations require human approval for AI agent actions?

A: Use human approval whenever an agent can change state, expose data, or expand access beyond the original task. If the action would be hard to reverse, hard to detect, or high impact if misused, it should not execute solely because the model suggested it.

Technical breakdown

Why prompt injection behaves differently from XSS

XSS is deterministic once the vulnerable page is reached. A malicious script executes in a browser context, and the attacker can usually predict what the payload will do if sanitisation and output encoding fail. Prompt injection is different because the model does not execute instructions in a fixed way. It interprets language, weighs context, and may produce different outputs for the same input. That non-determinism makes simple validation rules far less reliable, especially when the output can become an action in another system.

Practical implication: Treat prompt injection as an instruction integrity problem, not just an input filtering problem.

How indirect prompt injection reaches connected tools and agents

Indirect prompt injection works when malicious content is embedded in data the model is expected to process, such as email, documents, or calendar events. The model then generates instructions that a connected application trusts and executes. In agentic systems, this can extend across multiple tools, because the agent is allowed to call services, retrieve data, and chain actions. The risk is amplified when one agent can trigger another through A2A and when tool access is mediated by MCP servers, because trust can propagate faster than humans can inspect it.

Practical implication: Map every model-to-tool and agent-to-agent trust boundary before allowing execution authority.

Why deterministic controls still matter in agentic systems

The article’s strongest technical point is not that LLMs cannot be secured. It is that security must be layered around them with deterministic controls. That means limiting reachable tools, constraining actions by policy, validating outputs before execution, and separating read from write privileges where possible. Defence in depth matters because the model itself is not a reliable security boundary. The more autonomy an agent has, the more the surrounding control plane has to behave predictably.

Practical implication: Use policy, segmentation, and human approval gates to contain agent actions even when model behaviour varies.

Threat narrative

Attacker objective: The attacker aims to turn trusted AI-mediated workflows into an execution path for data exfiltration, unauthorised actions, or wider system abuse.

Entry via indirect prompt injection hidden in content the AI agent is designed to process, such as email, calendar text, or document bodies.
Escalation occurs when the model converts the malicious content into instructions that a connected tool or workflow accepts as legitimate.
Impact follows when the agent performs unintended actions, discloses data, or triggers downstream systems with elevated trust.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Indirect prompt injection should be treated as an identity and authority problem, not a content moderation problem. The real issue is not whether the model can recognise malicious language. It is whether a trusted agent can be induced to act on untrusted content with the authority of the user or system behind it. That moves the topic squarely into NHI governance, because the agent is functioning as a non-human identity with execution power.

Non-determinism creates trust debt for AI agents. The more freedom a model has to interpret natural language, the more difficult it becomes to guarantee stable, repeatable security outcomes. That makes legacy validation patterns insufficient on their own and increases the importance of deterministic policy layers around tool use, data access, and command execution. Practitioners should assume that every unconstrained agent accumulates trust debt over time.

Blast radius is the right control metric for agentic AI security. The article correctly points to the way agent chains, A2A interactions, and MCP-connected tools can widen the impact of a single prompt injection. The security question is no longer whether one payload succeeds, but how far it can travel once a model begins to act on it. Teams should design for containment first.

Prompt injection validates the need for zero-standing privilege in AI workflows. If an agent can read, reason, and execute with persistent access, then one compromised interaction can become a durable control failure. Ephemeral, task-scoped permissions reduce the window in which an injected instruction can do damage. Practitioners should push AI access toward just-in-time authority wherever execution is involved.

From our research:
1 in 4 organisations are already investing in dedicated NHI security capabilities, with an additional 60% planning to do so within the next twelve months, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to the same research.
For a broader breach lens, 52 NHI Breaches Analysis shows how identity abuse turns small access flaws into repeatable compromise patterns.

What this signals

Indirect prompt injection turns AI governance into an access-control problem. Once an agent can interpret untrusted text and call tools, the security model has to assume that content may become command. That means teams should align their AI controls with NIST AI Risk Management Framework governance expectations and use the OWASP Top 10 for Agentic Applications 2026 to stress-test tool abuse, context poisoning, and autonomous action paths.

Identity blast radius will become the deciding metric for agentic AI programmes. When access is persistent, the question is not only whether an agent is authenticated, but how far a single compromised instruction can travel across systems. The post implicitly points to a future where least privilege, execution boundaries, and task-scoped permissions matter more than model accuracy as operational controls.

AI teams should expect security reviews to move upstream into architecture decisions about tool wiring, not just prompt filtering. If agents can reach files, mailboxes, and administration interfaces, every connector becomes part of the control plane and every permission becomes a potential failure path.

For practitioners

Classify AI agents as non-human identities Inventory every agent, connector, and automation path as an identity with defined permissions, not as a feature of an application. That inventory should include read, write, and execution rights so ownership and review are explicit.
Separate model interpretation from system execution Require a deterministic policy layer between LLM output and any action that changes state, sends messages, or retrieves sensitive data. The model can propose, but a controlled workflow must decide what runs.
Restrict tool access by task scope Grant only the minimum tools and data sources an agent needs for a single workflow, then revoke access when the task ends. This reduces the damage window if malicious content influences the model.
Add human approval for high-impact actions Insert review gates before actions such as data export, account changes, external messaging, or privilege escalation. The approval step should be mandatory when the agent crosses from analysis into execution.

Key takeaways

Indirect prompt injection is an authority problem because trusted AI agents can be manipulated into executing unintended actions.
Non-deterministic model behaviour makes classic web injection controls insufficient on their own for agentic AI environments.
Teams need deterministic policy layers, tight tool scoping, and human approval gates to contain the blast radius of compromised agents.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Prompt injection and tool misuse are central agentic AI risks in this article.
NIST AI RMF		AI governance and accountability are needed when model outputs drive actions.
NIST Zero Trust (SP 800-207)	PR.AC-4	Least privilege and continuous verification reduce agent blast radius.

Establish governance, oversight, and incident response for AI actions that cross into execution.

Key terms

Indirect Prompt Injection: A form of attack where malicious instructions are hidden inside content an AI system is expected to process, such as email, documents, or calendar entries. The model may transform that content into actions or outputs that the attacker intended, even though the text did not come from the user.
Agentic AI: AI systems that can reason, choose actions, and invoke tools on behalf of a user or workflow. In security terms, these systems behave like non-human identities because they can hold permissions, access data, and execute operations beyond simple text generation.
Blast Radius: The amount of damage a compromised identity or workflow can cause before containment. For agentic AI, blast radius includes the number of tools, systems, and data stores an injected instruction can reach once a model begins acting on it.
Tool Trust Boundary: The point where model output becomes a system action. This boundary matters because an LLM is not a security control, and any tool that accepts its output must treat that output as untrusted until policy, validation, or approval confirms it is safe.

Deepen your knowledge

Indirect prompt injection and AI agent authority are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are designing controls for agentic workflows with tool access, it is worth exploring.

This post draws on content published by Noma Security: ForcedLeak and the problem of indirect prompt injection in agentic AI. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-29.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org