Subscribe to the Non-Human & AI Identity Journal

What do organisations get wrong about prompt injection?

They often treat it as a purely content-filtering problem. In practice, prompt injection is an instruction-trust problem that becomes serious when the chatbot can act on behalf of the organisation. The fix is not just blocking bad text. It is constraining what the model can access, what it can call, and what it can change.

Why This Matters for Security Teams

Prompt injection is often misread as a nuisance in the text layer, but the security risk appears when an AI system can retrieve data, call tools, or trigger workflows. At that point, untrusted input can influence decisions across email, ticketing, code, and cloud actions. NHI Management Group’s research shows why this matters: only 5.7% of organisations have full visibility into their service accounts, and 80% of identity breaches involved compromised non-human identities such as service accounts and API keys.

The real mistake is assuming content moderation can protect an execution environment. If the model can access secrets, approve actions, or chain tool calls, then the attack surface is identity, permissions, and orchestration, not just bad text. Guidance from the OWASP Agentic AI Top 10 and the Ultimate Guide to NHI both point to the same operational truth: autonomous systems fail safely only when their authority is tightly bounded. In practice, many security teams encounter prompt injection only after an agent has already used a privileged connector to move data or trigger an action.

How It Works in Practice

Prompt injection becomes exploitable when the system treats model output as trusted instruction. A malicious prompt can appear in a document, web page, support ticket, or pasted message, then influence an agent that is allowed to read, reason, and act. The goal is rarely to “break the model.” It is to get the model to misuse its authority.

Security teams should think in layers:

  • Constrain what the agent can see. Separate untrusted content from system instructions and high-value context.
  • Constrain what the agent can call. Tool access should be explicit, narrow, and task-scoped.
  • Constrain what the agent can change. Writes, approvals, and destructive actions should require additional checks.
  • Use workload identity for the agent, not shared secrets. The identity should prove what the workload is, and access should be evaluated at request time.
  • Issue short-lived credentials for specific tasks, then revoke them automatically when the task ends.

This aligns with current thinking in the OWASP Agentic AI Top 10 and the NHI lifecycle controls described by NHI Management Group. The practical pattern is not static role assignment. It is runtime policy, least privilege, and just-in-time authority tied to intent. Where teams mature further, they add policy-as-code, allowlists for tool use, and human approval gates for sensitive actions. These controls tend to break down when legacy automations reuse broad service accounts because the agent inherits standing access that was never designed for autonomous behavior.

Common Variations and Edge Cases

Tighter agent controls often increase friction, so organisations must balance autonomy against safety and operational speed. That tradeoff is real: too much restriction can make an agent useless, while too much freedom turns prompt injection into a business-process compromise.

Some environments are especially difficult. Retrieval-augmented generation can pull in hostile content from internal knowledge stores. Multi-agent systems can amplify a single injected instruction across several workflows. Browser-using agents can be manipulated by page content, hidden text, or copied instructions embedded in documents. In these cases, best practice is evolving, and there is no universal standard for how much content isolation is enough.

Two practical rules help. First, treat every external or user-provided string as untrusted, even if it arrives through a “trusted” internal system. Second, separate reasoning from execution so the model cannot directly convert interpreted text into privileged action. The strongest programs also monitor for unusual tool sequences, privilege escalation attempts, and secrets exposure through model context. For implementation guidance, the Ultimate Guide to NHI is useful for lifecycle controls, while the OWASP Agentic AI Top 10 is useful for agent-specific abuse patterns.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A01 Prompt injection is a core agent instruction-trust weakness.
CSA MAESTRO GRC-01 MAESTRO addresses governance for autonomous agent behavior and tool access.
NIST AI RMF AI RMF helps manage risks from manipulated model behavior and unsafe deployment.

Apply AI RMF controls to map prompt-injection scenarios to measured, monitored, and governed risks.