Subscribe to the Non-Human & AI Identity Journal
Home FAQ Agentic AI & Autonomous Identity What breaks when indirect prompt injection is not…
Agentic AI & Autonomous Identity

What breaks when indirect prompt injection is not controlled in AI systems?

← Back to all FAQ
By NHI Mgmt Group Editorial Team Updated June 7, 2026 Domain: Agentic AI & Autonomous Identity

Indirect prompt injection breaks the assumption that retrieved content is safe to use as instruction material. Once malicious text enters the model context, the system may alter responses, leak data, or trigger tools with delegated permissions. The core failure is boundary collapse between data and directive, which turns ordinary content ingestion into an execution risk.

Why This Matters for Security Teams

indirect prompt injection is not a prompt-tuning nuisance. It is a control failure that lets untrusted text behave like operator intent once it enters an LLM’s working context. That matters most when the system can retrieve documents, read tickets, browse pages, or invoke tools with delegated authority. The moment retrieved content can steer action, the boundary between data and directive is gone.

For security teams, the risk is broader than incorrect answers. A poisoned knowledge base entry, email, or web page can cause the model to expose secrets, alter records, or call a tool that should have required explicit human approval. The OWASP Agentic AI Top 10 treats this as a core agentic risk because model context is not a trusted boundary. NHI guidance from OWASP Agentic Applications Top 10 reinforces the same practical point: if the system cannot separate content from instruction, it cannot safely delegate work.

In practice, many security teams discover this only after a retrieval pipeline or tool chain has already been used to execute attacker-shaped instructions.

How It Works in Practice

Indirect prompt injection usually enters through a retrieval layer, browser automation, uploaded file, support ticket, or any other content source the model is expected to summarise. The malicious payload is not always obvious. It may look like a policy note, a formatting instruction, or a hidden block of text. Once the model ingests it, the text competes with the system prompt and user intent for control of the next action.

This becomes dangerous when the assistant has delegated permissions. If the agent can access mail, cloud consoles, CRM records, or code repositories, the injected text can trigger tool use, data exfiltration, or privilege misuse. Current guidance suggests treating every retrieved item as untrusted input, not as instruction material. The practical translation is simple: isolate content handling from action handling, and require explicit policy checks before any tool call that changes state.

Controls that usually help include content sanitisation, retrieval allowlists, instruction hierarchy enforcement, human approval for sensitive actions, and runtime policy evaluation. For agentic systems, the relevant question is not only “what did the model read?” but “what authority could the model exercise after reading it?” That is why DeepSeek breach is often cited in discussions of data exposure at model scale, while the OWASP Agentic Applications Top 10 and the OWASP Agentic AI Top 10 both emphasise context-bound execution risk.

  • Classify all retrieved content as untrusted unless it is cryptographically signed and policy-approved.
  • Separate summarisation from execution, so the model cannot convert text into action without checks.
  • Use just-in-time credentials for tools, with short TTLs and revocation after task completion.
  • Require intent-based authorisation for state-changing actions, not static role grants alone.
  • Log prompt, retrieval, and tool traces so injected instructions can be reconstructed quickly.

These controls tend to break down when the agent can chain multiple tools across disconnected systems because each step appears low risk in isolation.

Common Variations and Edge Cases

Tighter retrieval and tool controls often increase latency and operator overhead, so organisations have to balance resilience against speed and user experience. There is no universal standard for this yet, especially in multi-agent systems where one agent’s output becomes another agent’s input.

One common edge case is a workflow that appears read-only but still has indirect write impact, such as drafting an email, generating a ticket update, or preparing a change request that another system auto-commits. Another is when the model can read confidential context but cannot directly invoke tools; even then, the injected text can shape subsequent human decisions or contaminate downstream automation. Best practice is evolving toward runtime policy checks and workload identity rather than trusting static RBAC alone, because autonomous behaviour is dynamic and goal-driven.

The Schneider Electric credentials breach is relevant here as a reminder that exposed credentials magnify the damage when model-driven workflows are already context-contaminated. For implementation detail, teams should align with the Ultimate Guide to NHIs — Standards and the OWASP Agentic AI Top 10, then map those controls to runtime authorisation and short-lived secrets. The hardest cases are agentic environments with broad delegated access, because once a prompt injection lands, the system can reinterpret ordinary content as a command path.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10LLM01Prompt injection is a core agentic application attack path.
CSA MAESTROTRUST-03MAESTRO addresses agent trust boundaries and tool execution risk.
NIST AI RMFAI RMF covers governance for unsafe model behaviour and misuse.

Document prompt-injection scenarios, assign owners, and monitor for harmful outputs.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on June 7, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org