Subscribe to the Non-Human & AI Identity Journal

How should security teams contain prompt injection in agentic systems?

Containment should start with delegated identity, not prompt wording. Give the agent the smallest viable permission set, separate read-only and state-changing tools, and enforce policy at every tool call. If the injection succeeds, the agent should still be unable to reach sensitive systems, move money, deploy code, or exfiltrate data at scale.

Why This Matters for Security Teams

Prompt injection is not just a content-safety issue; in agentic systems it is a command-path issue. Once an AI agent can read untrusted text and then act through tools, the payload is no longer limited to model output. It can become a routing instruction, a data access request, or a chain of tool calls that reaches outside the original task boundary. That is why current guidance from the OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both push teams toward runtime controls, not prompt-only filtering.

The practical failure mode is privilege, not persuasion. If the agent has access to secrets, deployment tooling, finance workflows, or broad data connectors, a successful injection can redirect those privileges even when the model “knows better.” NHIMG research on OWASP NHI Top 10 and the SailPoint findings on AI agents performing actions beyond intended scope show why containment has to assume the instruction layer can fail while identity controls still hold. In practice, many security teams encounter prompt injection only after an agent has already touched an overbroad tool set, rather than through intentional adversary testing.

How It Works in Practice

Containment starts by treating the agent as an autonomous workload with constrained authority, not as a trusted user. Give it a workload identity, issue JIT credentials for a single task, and revoke them when the task ends. That is materially different from long-lived API keys or static service accounts. Short TTLs reduce the blast radius if the agent is manipulated mid-session, while RBAC alone is usually too coarse because autonomous behaviour is dynamic and goal-driven.

Operationally, the better pattern is intent-based authorisation. At every tool call, evaluate what the agent is trying to do, what data it is trying to reach, and whether that action matches the approved task context. Policy-as-code engines such as OPA or Cedar fit this model because they can enforce rules at request time instead of relying on a pre-approved prompt path. This is also where CSA MAESTRO agentic AI threat modeling framework is useful: it encourages teams to map tool boundaries, escalation paths, and trust zones before deployment.

  • Separate read-only tools from state-changing tools.
  • Use ZSP so the agent starts with no standing privilege.
  • Bind secrets to the task, not the session, using ephemeral tokens and scoped vault access.
  • Inspect tool requests for destination, action, and data class, not just model text.
  • Log every tool invocation for audit and replay.

For identity plumbing, teams should prefer workload identity over shared credentials, using cryptographic proof of what the agent is and what workload it belongs to. That makes it easier to enforce ZTA at the tool layer and to keep access decisions consistent across services. NHIMG has shown in the AI LLM hijack breach and DeepSeek breach coverage how exposed secrets and broad access can be turned into rapid compromise once an attacker reaches the control plane. These controls tend to break down when agents are wired directly to legacy systems that only accept persistent credentials and cannot evaluate policy at the point of each request.

Common Variations and Edge Cases

Tighter runtime control often increases orchestration overhead, so organisations have to balance safety against latency, developer friction, and operational complexity. That tradeoff becomes sharper in multi-agent workflows, where one agent may delegate to another and each hop needs its own authorisation check.

There is no universal standard for this yet, but current guidance suggests three common edge cases. First, retrieval-augmented agents can be injected through documents, tickets, chat transcripts, or web pages, so the ingestion layer needs the same filtering discipline as the execution layer. Second, coding agents need especially strict separation between read, write, and deploy privileges because a single poisoned instruction can turn into code exfiltration or production change. Third, agents that handle customer data or payments need stronger approval gates than internal assistants, even if the same model is reused.

For governance alignment, teams should map these controls to the OWASP Top 10 for Agentic Applications 2026 and the NIST AI Risk Management Framework, then test whether the agent can still reach sensitive systems if the prompt channel is fully compromised. The right question is not whether injection is possible, but whether it still matters once identity, policy, and secrets are constrained. In high-trust internal environments, these controls are often relaxed first, which is exactly where prompt injection later causes the least visible but most damaging access drift.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A1 Prompt injection is a core agentic application threat.
CSA MAESTRO TM-03 Maps agent tool boundaries and escalation paths for containment.
NIST AI RMF GOVERN Requires governance for autonomous AI risk and accountability.

Classify agent tool calls as untrusted input paths and gate each call with policy checks.