Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Agentic prompt injection: are your controls containing the blast radius?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 6713
Topic starter  

TL;DR: Prompt injection in agentic systems is an action problem, not just an output problem: attackers can steer AI agents to query internal data, perform unauthorized actions, or propagate malicious instructions across other agents, and a January 2026 meta-analysis found adaptive attacks succeed against state-of-the-art defenses more than 85% of the time, according to WorkOS. The governing assumption has shifted from “can the model be tricked?” to “what can the agent do if it is tricked?”

NHIMG editorial — based on content published by WorkOS: Securing agentic apps, with a focus on containing AI agent prompt injection

By the numbers:

  • A meta-analysis of 78 studies published in January 2026 found that adaptive attack success rates against state-of-the-art defenses exceed 85%.
  • In April 2026, researchers at Pillar Security demonstrated that a prompt injection in Google's Antigravity, an AI developer tool for filesystem operations, could be combined with the tool's permitted file-creation capability to achieve remote code execution.
  • In April 2026, a Cursor AI coding agent running Claude deleted a startup's entire production database and backups in a single API call, nine seconds after receiving an instruction the agent interpreted as legitimate.

Questions worth separating out

Q: How should security teams contain prompt injection in agentic systems?

A: Containment should start with delegated identity, not prompt wording.

Q: Why do agentic apps make prompt injection more dangerous than chatbots?

A: Agentic apps can turn manipulated text into real action.

Q: What breaks when prompt injection reaches a tool-using AI agent?

A: What breaks is the assumption that the model's output is low impact.

Practitioner guidance

  • Scope agent credentials to the minimum actionable set Give each agent only the permissions required for its narrow workflow, and separate read-only from state-changing entitlements so a hijacked prompt cannot expand into unrelated systems.
  • Treat untrusted content as adversarial input Tag emails, documents, web pages, and tool outputs by source before they enter the context window.
  • Enforce invocation policy at the tool boundary Validate arguments, inspect call sequences, and block dangerous combinations such as read-then-send exfiltration or filesystem writes outside the workspace.

What's in the full article

WorkOS' full article covers the operational detail this post intentionally leaves for the source:

  • Detailed examples of argument validation, chain analysis, and circuit breaker patterns for agent tool calls
  • Code samples for validating generated code before execution in filesystem and deployment workflows
  • Practical prompt structure guidance for separating system instructions, retrieved content, and user input
  • A closer walkthrough of how scoped credentials and RBAC bound the blast radius of a hijacked agent

👉 Read WorkOS' analysis of securing agentic apps against prompt injection →

Agentic prompt injection: are your controls containing the blast radius?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
Share: