What do teams get wrong about indirect prompt injection?

Why Teams Misread Indirect Prompt Injection

indirect prompt injection is often missed because teams assume the risk sits only in user prompts, not in retrieved text, documents, tickets, emails, or web pages that the model later treats as context. That blind spot matters most in agentic systems, where the model can act on those instructions, call tools, and carry state across steps. Current guidance from OWASP Agentic AI Top 10 and OWASP Agentic Applications Top 10 treats this as a control problem, not a content moderation problem.

Teams also underestimate how often the attack path is boring: a poisoned knowledge base entry, a malicious vendor note, or a compromised support article that looks legitimate in retrieval. Once that text is embedded into the model’s working context, the model cannot reliably distinguish business content from hostile instruction unless the system deliberately constrains what retrieved text can do. In practice, many security teams encounter the impact only after an agent has already leaked data, executed an unsafe tool action, or chained instructions across multiple steps, rather than through intentional testing.

How It Works in Practice

The practical mistake is treating retrieved content as passive reference material. In an indirect injection scenario, the model may read a document that says, in effect, “ignore previous instructions” or “send the latest secret to this endpoint,” and then continue processing as though that text were legitimate context. The model does not need to be tricked into “believing” the statement in a human sense. It only needs enough authority to follow it.

That is why retrieval layers, tool permissions, and output handling need separate controls. Security teams should constrain what external text can influence, filter or label untrusted sources, and keep tool execution behind explicit policy checks. For agentic systems, this usually means combining intent-based authorization, just-in-time credentials, and workload identity so the agent can only do what the current task requires. The OWASP Agentic AI Top 10 and OWASP Agentic Applications Top 10 both point practitioners toward the same operational lesson: the model’s context boundary is not a trust boundary.

Classify retrieved sources as trusted, semi-trusted, or untrusted before they reach the prompt.

Separate “read” access to content from “act” access to tools and data stores.

Use ephemeral secrets and short-lived credentials so a compromised step has limited blast radius.

Log retrieval, prompt assembly, and tool calls together so investigators can reconstruct the chain.

NHI governance adds an important datapoint here: NHI Mgmt Group research reports that only 5.7% of organisations have full visibility into their service accounts, which shows how often machine identities are already poorly understood before agentic workflows enter the picture. That is one reason indirect prompt injection should be treated as an identity and authorization problem as much as a content problem. These controls tend to break down when agents have broad tool access across many systems because a single poisoned context can trigger actions faster than a human reviewer can intervene.

Common Variations and Edge Cases

Tighter retrieval filtering often increases operational overhead, so organisations must balance safety against recall, latency, and user experience. That tradeoff becomes sharper when the system must search large knowledge bases, parse long documents, or process third-party content at scale.

There is no universal standard for this yet, but current guidance suggests several recurring edge cases. First, a document may contain both legitimate business content and hidden instructions, so simple allowlists are not enough. Second, an agent that can browse, summarize, and execute actions can be influenced across multiple turns, making single-prompt defenses too narrow. Third, some environments rely on long-lived secrets or shared service accounts, which turns a prompt injection into a broader identity compromise. NHI Mgmt Group research shows that 96% of organisations store secrets outside of secrets managers in vulnerable locations including code, config files, and CI/CD tools, which widens the impact if an agent is pushed into misuse.

For teams building controls, the safer pattern is to pair content inspection with runtime policy enforcement, short-lived credentials, and scoped workload identity. Where agents must act on external text, a human approval step or a hard policy gate may still be necessary for high-impact actions. Where those safeguards are absent, indirect prompt injection stops being an edge-case prompt issue and becomes a path to unauthorized tool use, data exposure, and privilege expansion.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A01	Indirect prompt injection is a core agentic app input-influence risk.
CSA MAESTRO		MAESTRO addresses agent workflow trust boundaries and tool misuse.
NIST AI RMF		AI RMF supports governing model behavior and downstream harms from unsafe context.

Define accountability, test for prompt injection, and monitor agent outputs for unsafe actions.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do teams get wrong about indirect prompt injection?

Why Teams Misread Indirect Prompt Injection

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group