TL;DR: Prompt injection exploits the boundary between user input and system instructions in AI systems, giving attackers a low-skill path to manipulate models that can access data and tools, according to WorkOS's interview with Noam Schwartz. The governance problem is that production AI agents collapse old web security assumptions about instruction trust, making safety, access, and tool-use boundaries inseparable.
NHIMG editorial — based on content published by WorkOS: AI is both weapon and target, Noam Schwartz on the new threat landscape
Questions worth separating out
Q: How should security teams reduce prompt injection risk in AI agents?
A: Security teams should separate instructions from untrusted content, minimise the agent’s tool permissions, and block direct action paths where possible.
Q: Why do AI agents create new identity governance problems?
A: AI agents create identity governance problems because they can access data, choose tools, and trigger actions inside a single runtime session.
Q: What do organisations get wrong about AI safety and access control?
A: Organisations often focus on model outputs while ignoring the privileges behind the model.
Practitioner guidance
- Separate instruction channels from data channels Do not allow user-supplied content, retrieved content, and system instructions to share the same trust assumptions.
- Constrain tool access at the agent boundary Limit each agent to the minimum tool set required for its task, and require explicit policy checks before any action that touches sensitive data, writes records, or invokes downstream systems.
- Classify AI agents as governed non-human identities Assign ownership, review cadence, and approval rules to every production agent that can access data or call tools.
What's in the full article
WorkOS's full interview covers the operational detail this post intentionally leaves for the source:
- The full conversation on how enterprise teams should evaluate prompt injection risk when AI systems can read untrusted content and invoke tools.
- Practical commentary on production AI deployment gaps, including where demo-era assumptions fail in real security and compliance reviews.
- Schwartz's forward-looking view on multi-model architecture, model routing, and why teams are moving away from single-provider dependency.
- The interview context from HumanX 2026, including how the market is thinking about AI safety and trust at scale.
👉 Read WorkOS's interview on prompt injection, AI agents, and trust boundaries →
Prompt injection and AI agent trust: what security teams need now?
Explore further
Prompt injection is the AI equivalent of an identity boundary failure, not just an input-validation bug. The article correctly frames the problem as a new class of trust compromise because the model cannot reliably distinguish instructions from content once they share a context window. That makes the failure systemic, not cosmetic. In identity terms, the dangerous assumption is that untrusted input remains inert after ingestion. Practitioners should treat this as a control-plane issue, not a prompt-quality issue.
A few things that frame the scale:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, which leaves 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: How can teams tell whether an AI agent is safely governed?
A: A governed AI agent has explicit ownership, narrowly defined tool access, visible decision paths, and tested failure modes under adversarial input. If the organisation cannot explain who approves its scope, what it can reach, and how it is monitored, the agent is operating outside acceptable control boundaries.
👉 Read our full editorial: Prompt injection is the new SQL injection for AI agents