TL;DR: Indirect prompt injection succeeds when malicious instructions are embedded in trusted data and LLMs can act on them across sensitive workflows, according to Pillar Security’s analysis. The real risk is not the payload alone but the combination of private data access, untrusted inputs, and external communication that turns prompt attacks into operational exploits.
NHIMG editorial — based on content published by Pillar Security: Anatomy of an Indirect Prompt Injection
By the numbers:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments.
Questions worth separating out
Q: How should security teams reduce indirect prompt injection risk in LLM workflows?
A: Start by separating untrusted content from system instructions, then limit what the model can do with sensitive data.
Q: Why do private-data access and outbound tools make prompt injection worse?
A: Because prompt injection becomes operational when the model can read something valuable and send it somewhere useful.
Q: What do teams get wrong about indirect prompt injection?
A: They often focus only on the prompt text and ignore the surrounding workflow.
Practitioner guidance
- Separate instruction channels from data channels Keep system instructions, user prompts, and untrusted content in distinct processing paths.
- Restrict outbound capability on high-risk LLM workflows Remove or tightly mediate external communication paths where models process private data.
- Test for CFS exposure in real workflows Red-team the exact content types your teams use most, including HTML, JSON, code comments, and ticket text.
What's in the full article
Pillar Security's full research covers the operational detail this post intentionally leaves for the source:
- Side-by-side examples of successful and failed indirect prompt injection payloads across email, tickets, and code.
- Detailed breakdown of how the CFS model changes with content format, placement, and instruction phrasing.
- Workflow-specific attacker patterns that show how context fit changes between assistants, coding tools, and ticketing systems.
- Additional examples of how defenders can recognise high-salience payloads before they reach tool execution.
👉 Read Pillar Security's analysis of indirect prompt injection and the CFS model →
Indirect prompt injection and the governance gap teams are missing?
Explore further
Indirect prompt injection is an instruction-boundary problem before it is an AI problem. The failure begins when systems collapse data and directives into one processing stream, then ask the model to decide what is authoritative. That makes the control gap broader than prompt hygiene. Practitioners need to treat content ingestion, tool invocation, and output generation as separate trust zones.
A few things that frame the scale:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: Who is accountable when an LLM leaks data after following malicious instructions?
A: Accountability sits with the organisation that granted the model access, connected the tools, and allowed untrusted content into the same decision path. That makes this a governance issue across IAM, security engineering, and application ownership, not a defect that belongs to the model alone.
👉 Read our full editorial: Indirect prompt injection is becoming an operational exploit