Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Indirect prompt injection and the governance gap teams are missing


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 5324
Topic starter  

TL;DR: Indirect prompt injection succeeds when malicious instructions are embedded in trusted data and LLMs can act on them across sensitive workflows, according to Pillar Security’s analysis. The real risk is not the payload alone but the combination of private data access, untrusted inputs, and external communication that turns prompt attacks into operational exploits.

NHIMG editorial — based on content published by Pillar Security: Anatomy of an Indirect Prompt Injection

By the numbers:

Questions worth separating out

Q: How should security teams reduce indirect prompt injection risk in LLM workflows?

A: Start by separating untrusted content from system instructions, then limit what the model can do with sensitive data.

Q: Why do private-data access and outbound tools make prompt injection worse?

A: Because prompt injection becomes operational when the model can read something valuable and send it somewhere useful.

Q: What do teams get wrong about indirect prompt injection?

A: They often focus only on the prompt text and ignore the surrounding workflow.

Practitioner guidance

  • Separate instruction channels from data channels Keep system instructions, user prompts, and untrusted content in distinct processing paths.
  • Restrict outbound capability on high-risk LLM workflows Remove or tightly mediate external communication paths where models process private data.
  • Test for CFS exposure in real workflows Red-team the exact content types your teams use most, including HTML, JSON, code comments, and ticket text.

What's in the full article

Pillar Security's full research covers the operational detail this post intentionally leaves for the source:

  • Side-by-side examples of successful and failed indirect prompt injection payloads across email, tickets, and code.
  • Detailed breakdown of how the CFS model changes with content format, placement, and instruction phrasing.
  • Workflow-specific attacker patterns that show how context fit changes between assistants, coding tools, and ticketing systems.
  • Additional examples of how defenders can recognise high-salience payloads before they reach tool execution.

👉 Read Pillar Security's analysis of indirect prompt injection and the CFS model →

Indirect prompt injection and the governance gap teams are missing?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
Share: