TL;DR: Prompt injection exploits the boundary between user input and system instructions in AI systems, giving attackers a low-skill path to manipulate models that can access data and tools, according to WorkOS's interview with Noam Schwartz. The governance problem is that production AI agents collapse old web security assumptions about instruction trust, making safety, access, and tool-use boundaries inseparable.
At a glance
What this is: This interview argues that prompt injection has become the defining AI application risk because modern AI agents can ingest untrusted input, reach into data, and invoke tools.
Why it matters: IAM and security teams need to treat AI agents as governed identities because the same access model that works for static applications fails when instructions, data, and actions converge at runtime.
👉 Read WorkOS's interview on prompt injection, AI agents, and trust boundaries
Context
Prompt injection is a form of instruction manipulation in which untrusted text changes how an AI system behaves. The problem matters for AI agent security because these systems often sit inside production workflows, can read sensitive content, and can call external tools without the hard separation that traditional application controls assume.
The security gap is not just model quality. It is the identity and trust model around the agent: what it may see, what it may pass forward, and which tools it can invoke after parsing user or external input. In that sense, the issue spans agentic AI governance, NHI controls, and broader enterprise access management.
This starting position is typical for organisations that moved from demo workflows to production AI without redesigning authorisation boundaries. Once untrusted input can influence tool use, the old web-app threat model stops being enough.
Key questions
Q: How should security teams reduce prompt injection risk in AI agents?
A: Security teams should separate instructions from untrusted content, minimise the agent’s tool permissions, and block direct action paths where possible. Prompt injection becomes dangerous when a model can turn external text into behaviour. The safest posture is to treat every input as hostile until policy checks confirm what the agent may see and do.
Q: Why do AI agents create new identity governance problems?
A: AI agents create identity governance problems because they can access data, choose tools, and trigger actions inside a single runtime session. That makes them governed actors, not just software features. Traditional IAM assumes stable identities and predictable request flows, while agents can alter execution paths in response to injected or untrusted text.
Q: What do organisations get wrong about AI safety and access control?
A: Organisations often focus on model outputs while ignoring the privileges behind the model. If an agent can read sensitive data or invoke tools, the real risk is what it can cause the environment to do. Effective control starts with scope, policy, and monitoring around actions, not just moderation of generated text.
Q: How can teams tell whether an AI agent is safely governed?
A: A governed AI agent has explicit ownership, narrowly defined tool access, visible decision paths, and tested failure modes under adversarial input. If the organisation cannot explain who approves its scope, what it can reach, and how it is monitored, the agent is operating outside acceptable control boundaries.
Technical breakdown
Prompt injection in AI agents: why instruction boundaries fail
Prompt injection works because many LLM-based systems mix user content, system instructions, retrieval results, and tool responses into one context. The model does not possess a reliable native mechanism for separating “data” from “commands” once they are in the same prompt stream. That creates an exploit path where malicious text can override intended behaviour, leak hidden instructions, or steer downstream tool calls. Unlike classic web injection, there is no clean parameterization boundary for natural language, which means the attack surface includes reasoning and action selection, not just input validation.
Practical implication: isolate instruction sources, constrain tool invocation paths, and treat all untrusted text as potential control input.
AI tool use and identity boundaries in production workflows
An AI agent that can call external tools becomes more than a text interface. It can influence real systems, reach into data stores, and chain actions across services. That changes the identity problem from “who can log in” to “what can the agent cause to happen after it is already inside the workflow.” If the agent is allowed to inspect sensitive data and then act on it, prompt injection can become a privilege-escalation path without any credential theft. The control boundary has to exist around the tool surface, not only around the model endpoint.
Practical implication: enforce least privilege at the tool and data layer, not just at the application or model boundary.
Why AI safety becomes a trust and safety governance problem
The article’s core message is that AI security is not a narrow model-hardening exercise. It is a trust and safety problem because enterprises are deploying systems that process untrusted input while holding access to sensitive information and external capabilities. That creates a composite risk profile where abuse, manipulation, exfiltration, and operational misuse can all happen through the same interface. Traditional application security assumes stable request boundaries and predictable execution paths. AI agents break that assumption because the path from input to action is probabilistic and context-sensitive.
Practical implication: build policy, monitoring, and response around agent behaviour, not only around model accuracy or content moderation.
Threat narrative
Attacker objective: The attacker aims to turn the agent’s legitimate access and tool privileges into a channel for data exposure, unsafe action, or internal system misuse.
- Entry occurs when an attacker supplies untrusted text into an AI system that accepts external content and uses it inside the model context.
- Credential access or abuse happens when the injected prompt steers the agent toward sensitive data or tool calls that were not intended by the user.
- Impact follows when the agent leaks information, performs an unsafe action, or amplifies the attacker’s reach into connected systems.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Prompt injection is the AI equivalent of an identity boundary failure, not just an input-validation bug. The article correctly frames the problem as a new class of trust compromise because the model cannot reliably distinguish instructions from content once they share a context window. That makes the failure systemic, not cosmetic. In identity terms, the dangerous assumption is that untrusted input remains inert after ingestion. Practitioners should treat this as a control-plane issue, not a prompt-quality issue.
AI agents extend the NHI attack surface by turning language into an execution path. Once a system can read sensitive data and invoke tools, the security question becomes what the agent can do with what it sees. That aligns directly with OWASP-NHI and zero-trust thinking because privilege is no longer confined to static credentials. The field should stop describing these systems as “just applications” and start governing them as non-human actors with runtime authority.
Runtime trust collapse: the assumption that instructions can be safely separated from data was designed for deterministic software boundaries. That assumption fails when the actor is an AI system because user text can alter tool choice, execution sequence, and action timing inside the same session. The implication is that old application-security models do not explain or contain agent behaviour.
The production gap is now an identity and governance gap. The interview is a reminder that enterprise AI deployments fail when security is treated as a post-launch add-on. The control problem is not whether a model can answer questions, but whether the organisation can govern what it sees, what it passes on, and what it is allowed to trigger. Practitioners should reframe AI safety as a standing governance domain with explicit owners.
AI safety will converge with access governance, not sit beside it. Schwartz’s comments point to a market where model selection, routing, and tool chaining will become normal enterprise architecture. That means identity teams will own more of the risk surface as AI systems mediate access to data and services. The practical conclusion is that agent governance, NHI controls, and application policy will increasingly be the same conversation.
From our research:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, which leaves 48% with a complete blind spot for compliance and breach investigation.
- For a broader control model, OWASP Agentic AI Top 10 frames the failure modes that appear once agents can select tools and act on runtime input.
What this signals
Runtime trust collapse: the next governance problem is not whether AI agents can be blocked from doing unsafe things, but whether their behaviour can be explained and bounded after untrusted input enters the session. Identity teams should expect stronger demand for per-agent policy, event tracing, and action-level approval controls as AI systems move from experiments to workflows.
The combination of untrusted input and tool access means existing application review processes will miss the most important failures. Teams should prepare to govern AI agents like privileged workloads, with explicit scope, logging, and offboarding. That is especially important where the agent can reach personal or regulated data.
With 80% of current deployments already showing behaviour outside intended scope, the pressure on controls is now operational, not theoretical. The practical signal is that organisations need to align AI governance with NHI lifecycle discipline, zero-trust segmentation, and model-risk oversight before the deployment base expands further.
For practitioners
- Separate instruction channels from data channels Do not allow user-supplied content, retrieved content, and system instructions to share the same trust assumptions. Use explicit parsing, sanitisation, and role separation before content reaches the model.
- Constrain tool access at the agent boundary Limit each agent to the minimum tool set required for its task, and require explicit policy checks before any action that touches sensitive data, writes records, or invokes downstream systems.
- Classify AI agents as governed non-human identities Assign ownership, review cadence, and approval rules to every production agent that can access data or call tools. Track its scope as you would any privileged service identity, including lifecycle and offboarding.
- Test prompt-injection paths in red-team exercises Include malicious instructions, retrieved-content poisoning, and tool-steering scenarios in security testing so the organisation sees how the agent behaves under adversarial input.
Key takeaways
- Prompt injection matters because it turns natural language into a control path inside AI agents, not just a content issue.
- The risk is already visible in production, where AI systems can access data and tools that make manipulation materially harmful.
- Teams need to govern AI agents as privileged non-human identities with scoped access, traceability, and tested failure modes.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | AG-01 | Prompt injection maps directly to agent goal and tool misuse. |
| OWASP Non-Human Identity Top 10 | NHI-03 | Agents with tool access behave like governed non-human identities. |
| NIST AI RMF | Trust, governance, and monitoring are central to AI risk management. |
Establish governance, monitoring, and accountability for AI systems that can act.
Key terms
- Prompt Injection: Prompt injection is the manipulation of an AI system by embedding instructions inside untrusted content. The model may treat that content as operational guidance rather than data, which can change outputs, tool calls, or downstream actions in ways the operator did not intend.
- Agentic AI: Agentic AI is AI software that can decide what to do next, choose tools, and carry out actions with limited human intervention. In governance terms, it behaves like a non-human identity with runtime authority, which means access, logging, and scope controls must follow the action path, not just the user interface.
- Runtime Trust Boundary: A runtime trust boundary is the point where a system decides whether incoming text, retrieved data, or external events are allowed to influence action. For AI agents, this boundary is often weak or implicit, which makes policy enforcement and tool restriction essential before the model can execute anything.
- Tool Access Scope: Tool access scope is the exact set of systems, actions, and data an AI agent is allowed to reach. It should be narrow, explicit, and reviewable, because the risk emerges when the agent can transform a prompt into a real-world operation across connected services.
Deepen your knowledge
Prompt injection, AI agent trust boundaries, and non-human identity governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for production AI systems that can read data and call tools, it is worth exploring.
This post draws on content published by WorkOS: AI is both weapon and target, Noam Schwartz on the new threat landscape. Read the original.
Published by the NHIMG editorial team on 2026-04-15.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org