Subscribe to the Non-Human & AI Identity Journal

What do teams get wrong about prompt injection in AI assistants?

They treat it as a content safety issue instead of an access issue. Prompt injection becomes dangerous when the assistant can read sensitive history, call APIs, or write files on the user’s behalf. The risk is not only what the prompt says. It is what identity and egress permissions allow the prompt to trigger.

Why Security Teams Misread Prompt Injection

Prompt injection is often framed as “bad text in, bad answer out,” but that misses the operational risk. The problem starts when an assistant has access to inboxes, ticketing systems, code repositories, file systems, or APIs. At that point, the prompt is not just content. It is a trigger path into identity, privilege, and egress. Current guidance from the OWASP Agentic AI Top 10 treats tool access, authorization, and data exposure as first-order security concerns, which is the right lens.

Teams get into trouble when they assume the model itself must be “compromised” for damage to occur. In reality, a perfectly functioning assistant can still be unsafe if it can retrieve sensitive history, send messages, invoke functions, or write files without runtime guardrails. That is why the issue belongs in NHI governance, not only content moderation. NHI Management Group has also documented how attackers abuse compromised identities in LLMjacking: How Attackers Hijack AI Using Compromised NHIs, which shows how quickly credential abuse can turn AI access into a real incident. In practice, many security teams encounter prompt injection only after the assistant has already queried sensitive systems or taken an irreversible action.

How It Works in Practice

Prompt injection succeeds when the assistant is allowed to act on untrusted instructions with trusted privileges. A user message, a document, or a web page can include hidden directives that steer the model toward disclosure, escalation, or tool misuse. The defensive answer is not a single filter. It is a control stack: restrict what the assistant can read, constrain what it can call, validate each action at runtime, and separate the model’s “reasoning” from the authority to execute.

For agentic systems, the strongest pattern is intent-based authorization. Instead of giving an assistant broad RBAC access and hoping the model behaves, decisions are made at request time based on what the agent is trying to do, the data it is touching, and the context of the task. That usually means short-lived JIT credentials, ephemeral secrets, and workload identity bound to the workload rather than to a human user. The assistant should prove what it is via cryptographic identity, then receive only the minimum capability needed for that task. This aligns with the direction of the OWASP Agentic AI Top 10 and the NHI guidance in the OWASP Agentic Applications Top 10.

Practical controls usually include:

  • Tool allowlists with per-action approval, not blanket assistant access.
  • Short TTL tokens that expire after the task completes.
  • Policy checks before every sensitive read, write, or outbound call.
  • Redaction of memory, logs, and retrieval sources that are not needed for the task.
  • Separate identities for retrieval, execution, and administrative actions.

That approach is especially important when the assistant can chain tools, browse external content, or act across multiple systems. The DeepSeek breach is a reminder that exposed data, secrets, and chat history can create downstream risk far beyond the original prompt. These controls tend to break down in legacy environments where one service account still has broad standing permissions across mail, storage, and SaaS APIs.

Common Variations and Edge Cases

Tighter control often increases latency, integration effort, and user friction, so organisations have to balance safety against operational speed. That tradeoff is real, and it is why current guidance suggests moving in stages rather than trying to bolt on perfect governance after deployment.

One common edge case is the “helpful assistant” that only reads data at first but later gains write privileges. A second is multi-agent workflows, where one agent’s output becomes another agent’s instruction. In those designs, a prompt injection can travel laterally through the system if each agent is trusted by default. Best practice is evolving, but there is no universal standard for this yet: most teams are combining policy-as-code, runtime authorization, and strict separation of duties while they pilot agentic systems. The stronger models are emerging in work such as the OWASP Agentic AI Top 10 and NHI Management Group’s OWASP Agentic Applications Top 10, but implementation still varies widely.

Another edge case is long-lived memory. If an assistant stores untrusted instructions or sensitive fragments and reuses them later, prompt injection becomes durable rather than transient. That is why ephemeral secrets, short-lived credentials, and memory hygiene matter together. Security teams should also watch for environments where model access is safe but downstream tools are not, because the assistant may still trigger harmful actions through a legitimate API. In those environments, access review alone is too slow and static to be an effective control.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 AG-01 Prompt injection is a core agentic app threat involving tool abuse and unsafe action execution.
CSA MAESTRO GOV-02 Agent governance must cover runtime authorization, memory, and tool use.
NIST AI RMF GOVERN AI RMF governance applies to oversight of autonomous assistant decisions and impact.

Assign ownership, monitor outcomes, and document controls for assistant-driven actions.