Notifications

Clear all

Agentic prompt injection: are your controls containing the blast radius?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 23/06/2026 9:07 pm

TL;DR: Prompt injection in agentic systems is an action problem, not just an output problem: attackers can steer AI agents to query internal data, perform unauthorized actions, or propagate malicious instructions across other agents, and a January 2026 meta-analysis found adaptive attacks succeed against state-of-the-art defenses more than 85% of the time, according to WorkOS. The governing assumption has shifted from “can the model be tricked?” to “what can the agent do if it is tricked?”

NHIMG editorial — based on content published by WorkOS: Securing agentic apps, with a focus on containing AI agent prompt injection

By the numbers:

A meta-analysis of 78 studies published in January 2026 found that adaptive attack success rates against state-of-the-art defenses exceed 85%.
In April 2026, researchers at Pillar Security demonstrated that a prompt injection in Google's Antigravity, an AI developer tool for filesystem operations, could be combined with the tool's permitted file-creation capability to achieve remote code execution.
In April 2026, a Cursor AI coding agent running Claude deleted a startup's entire production database and backups in a single API call, nine seconds after receiving an instruction the agent interpreted as legitimate.

Questions worth separating out

Q: How should security teams contain prompt injection in agentic systems?

A: Containment should start with delegated identity, not prompt wording.

Q: Why do agentic apps make prompt injection more dangerous than chatbots?

A: Agentic apps can turn manipulated text into real action.

Q: What breaks when prompt injection reaches a tool-using AI agent?

A: What breaks is the assumption that the model's output is low impact.

Practitioner guidance

Scope agent credentials to the minimum actionable set Give each agent only the permissions required for its narrow workflow, and separate read-only from state-changing entitlements so a hijacked prompt cannot expand into unrelated systems.
Treat untrusted content as adversarial input Tag emails, documents, web pages, and tool outputs by source before they enter the context window.
Enforce invocation policy at the tool boundary Validate arguments, inspect call sequences, and block dangerous combinations such as read-then-send exfiltration or filesystem writes outside the workspace.

What's in the full article

WorkOS' full article covers the operational detail this post intentionally leaves for the source:

Detailed examples of argument validation, chain analysis, and circuit breaker patterns for agent tool calls
Code samples for validating generated code before execution in filesystem and deployment workflows
Practical prompt structure guidance for separating system instructions, retrieved content, and user input
A closer walkthrough of how scoped credentials and RBAC bound the blast radius of a hijacked agent

👉 Read WorkOS' analysis of securing agentic apps against prompt injection →

Agentic prompt injection: are your controls containing the blast radius?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

25/06/2026 2:14 am

Prompt injection becomes an identity governance issue the moment an agent can act. The article is correct to treat this as more than a model safety problem because the harmful unit is the agent identity, not the text response. Once the runtime can browse, write, send, or execute, the question becomes which actions that identity can take under hijacked intent. Practitioner conclusion: agent security has to be governed as delegated identity with bounded authority, not as chat moderation.

A few things that frame the scale:

80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 44% of organisations have implemented any policies to govern AI agents, despite 92% saying that governing them is critical to enterprise security, according to AI Agents: The New Attack Surface report.

A question worth separating out:

Q: Who is accountable when an AI agent performs an unauthorized action after injection?

A: Accountability follows the governance model that granted the agent its permissions and execution rights. The owner of the agent workflow, the approver of its tool scope, and the team operating the control plane all share responsibility. Frameworks such as OWASP-NHI and zero trust expect those boundaries to be explicit.

👉 Read our full editorial: Agentic prompt injection turns text into actions, not just outputs

ReplyQuote

Forum Statistics

11 Forums

13.5 K Topics

25.8 K Posts

45 Online

135 Members

Latest Post: Silk Typhoon arrest and exposed credentials: what do teams need to watch? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies