Notifications

Clear all

Indirect prompt injection: are your AI controls keeping up?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 07/06/2026 7:53 pm

TL;DR: Indirect prompt injection lets attackers hide malicious instructions inside emails, documents, web pages, and knowledge bases that AI systems already trust, and agentic AI can turn that into unauthorized actions with system credentials, according to WitnessAI. Existing guardrails, pattern matching, and application controls provide only partial coverage because the model cannot reliably separate trusted instructions from untrusted content.

NHIMG editorial — based on content published by WitnessAI: Indirect prompt injection and the security implications for AI systems

By the numbers:

U.S. breach costs reached $10.22 million while Shadow AI added $670,000 to average breach costs for organisations with high levels of unsanctioned AI usage.
97% of organisations that reported an AI-related breach lacked proper AI access controls.
Only 34% performed regular audits for unsanctioned AI.

Questions worth separating out

Q: What breaks when indirect prompt injection is not controlled in AI systems?

A: Indirect prompt injection breaks the assumption that retrieved content is safe to use as instruction material.

Q: Why do AI agents make indirect prompt injection more dangerous for enterprises?

A: AI agents make indirect prompt injection more dangerous because the model can take actions, not just generate text.

Q: How do security teams know whether intent-based classification is working for AI content?

A: Teams should test whether the control catches semantically disguised requests, multilingual payloads, hidden text, and transformed instructions that do not match known signatures.

Practitioner guidance

Scan retrieval paths before model execution Inspect emails, documents, web pages, and knowledge base entries before they are merged into the model context.
Apply bidirectional inspection to model flows Validate both inputs and outputs so a poisoned response cannot become the next prompt, the next tool call, or an exfiltration channel.
Treat tool access as privileged execution Restrict least-privilege tool use, scope-limited credentials, and human approval for high-risk actions such as financial changes, protected data access, and system modifications.

What's in the full article

WitnessAI's full research covers the operational detail this post intentionally leaves for the source:

Step-by-step defence layering for prompt, response, and agent traffic inspection across AI workflows
Specific examples of delimiter, role hijacking, multilingual, and semantic injection patterns in practice
Operational guidance for detecting MCP servers, shadow agents, and hidden tool paths before they are abused
Implementation detail on tokenization, human-in-the-loop gating, and auditable compliance trails

👉 Read WitnessAI's analysis of indirect prompt injection and AI identity risk →

Indirect prompt injection: are your AI controls keeping up?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

07/06/2026 9:38 pm

Indirect prompt injection is an identity problem before it is a prompt problem. The attack succeeds because organisations allow untrusted content to enter the same decision space as trusted instructions. Once that boundary collapses, content governance and authorisation governance become the same control plane from the model’s point of view. Practitioners should treat retrieval, memory, and tool inputs as part of the identity surface, not as neutral text channels.

A few things that frame the scale:

97% of organisations that reported an AI-related breach lacked proper AI access controls, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.
Shadow AI added $670,000 to average breach costs for organisations with high levels of unsanctioned AI usage, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.

A question worth separating out:

Q: Who is accountable when an AI system acts on injected content?

A: Accountability sits with the organisation that allowed untrusted content, retrieval paths, and privileged execution to intersect without adequate controls. Regulators and auditors will look for audit trails, approval gates, access scope, and evidence that high-risk actions required separate authorisation. Without that, the system owner cannot credibly argue that the action was isolated or unintended.

👉 Read our full editorial: Indirect prompt injection exposes a new AI identity attack surface

ReplyQuote

Forum Statistics

11 Forums

13.5 K Topics

25.8 K Posts

89 Online

135 Members

Latest Post: Silk Typhoon arrest and exposed credentials: what do teams need to watch? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies