Indirect prompt injection exposes the governance gap in AI systems

By NHI Mgmt Group Editorial TeamPublished 2026-05-22Domain: Agentic AI & NHIsSource: CrowdStrike

TL;DR: Indirect prompt injection lets attackers hide malicious instructions in content that AI systems read, turning ordinary documents, emails, and web pages into control channels for exfiltration and misuse, according to CrowdStrike's analysis. The risk is not just model abuse; it is a governance failure when agents can read widely and act with too much privilege.

At a glance

What this is: This is an analysis of indirect prompt injection and how hidden instructions in external content can steer AI systems into unsafe actions.

Why it matters: It matters because AI agents and other NHI often consume untrusted content while holding access to sensitive tools, data, and workflows.

By the numbers:

45%, rly half of employees surveyed, 45%, report using AI tools like email clients, document processors, and code assistants without IT's knowledge.
45%, ecent study by Gusto shows that nearly half of employees surveyed, 45%, report using AI tools like email clients, document processors, and code assistants, without IT's knowledge.

👉 Read CrowdStrike's analysis of indirect prompt injection attacks and AI risk

Context

Indirect prompt injection is a control problem, not just a model-safety problem. It happens when malicious instructions are embedded in content that an AI system can read, then those instructions influence tool use, data access, or downstream actions. For IAM and NHI practitioners, the issue is that autonomous or semi-autonomous agents do not only consume data, they also inherit authority from the identities, sessions, and secrets attached to them.

The governance gap appears when AI systems are allowed broad read access and meaningful write access without strong separation of duties. That combination makes ordinary content into a delivery path for abuse. CrowdStrike's article frames the risk through security operations, but the underlying issue is common to any environment where agents, service accounts, and sessions are expected to act on behalf of users or teams. That starting point is increasingly typical in modern AI adoption.

Security teams need a way to reason about the prompt layer as part of the identity plane. If an AI agent can ingest external content, use enterprise tools, and trigger actions, then its authority must be treated like any other high-risk NHI lifecycle problem. For a broader identity lens, the NHI Lifecycle Management Guide is the right companion resource.

Key questions

Q: How should security teams reduce indirect prompt injection risk in AI systems?

A: Security teams should limit what AI systems can read, separate untrusted content from privileged actions, and apply least privilege to every connected agent. The strongest posture combines content filtering, allowlisted sources, short-lived sessions, and explicit approval for sensitive actions. If any one of those layers is missing, the attack path remains open.

Q: Why do AI agents make prompt injection more dangerous than chat-only tools?

A: AI agents are more dangerous because they can act, not just generate text. When a model can invoke tools, access records, or send messages, a hidden instruction can become a real enterprise action. The risk rises sharply if the agent inherits broad NHI permissions instead of narrowly scoped access.

Q: What is the difference between prompt filtering and identity governance for AI agents?

A: Prompt filtering tries to stop malicious instructions from influencing the model, while identity governance limits what the agent is allowed to do if it is influenced. Filtering reduces exposure at the content layer. Identity governance limits blast radius at the access layer. Mature programmes need both, because each control covers a different failure mode.

Q: When does indirect prompt injection become a business risk rather than a technical curiosity?

A: It becomes a business risk when the affected AI system can reach sensitive data, change records, or communicate externally. At that point, an injected instruction can affect operations, compliance, or customer trust. If the workflow also uses unmanaged credentials or persistent sessions, the risk becomes much harder to contain.

Technical breakdown

How indirect prompt injection works across content and context

Indirect prompt injection differs from classic malware because the payload is semantic, not executable code. The attacker hides instructions in a document, webpage, email footer, image metadata, or database field that an AI system later ingests. The model then treats the malicious text as relevant context and may follow it if guardrails, source trust controls, or instruction hierarchy are weak. The failure point is often not the model alone but the orchestration layer that decides what content gets into context and what tools the agent can reach.

Practical implication: limit what enters the model context and separate trusted instructions from untrusted content.

Why AI agent privilege amplifies prompt injection risk

Prompt injection becomes materially worse when the AI system can do more than answer questions. If an agent can read mail, query databases, send messages, or invoke APIs, then a successful injection can turn a content manipulation event into an identity abuse event. The agent's access is usually inherited from an NHI such as a service account, token, or delegated session. That means the real control plane is identity and authorization, not the model prompt alone. Without least privilege, a harmless-looking text input can become an execution path.

Practical implication: scope AI agents to the smallest feasible set of tools, data, and actions.

Prompt-layer controls versus NHI governance controls

Prompt-layer controls focus on detecting malicious instructions, sanitizing inputs, and constraining what the model is allowed to follow. NHI governance controls focus on who or what can act, what credentials are issued, how sessions are bounded, and whether access expires quickly. Both are needed, but they solve different failure modes. A prompt filter may block obvious attacks, yet it does nothing if an over-privileged agent is already trusted to exfiltrate data or perform lateral actions through legitimate APIs.

Practical implication: pair content controls with credential, session, and entitlement controls rather than treating them as substitutes.

Threat narrative

Attacker objective: The attacker wants to turn a trusted AI workflow into an execution path that leaks data or performs actions under legitimate enterprise authority.

Entry via malicious instructions hidden in external content that an AI system reads during normal operation.
Escalation when an over-privileged agent inherits access to tools, mailboxes, databases, or workflow APIs.
Impact through data exfiltration, business process manipulation, or downstream reconnaissance performed by the agent.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Shai Hulud npm malware campaign — Shai Hulud campaign: npm malware exposed secrets on GitHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Indirect prompt injection creates prompt-layer governance debt: the security issue is not just whether a model can be fooled, but whether the enterprise has separated untrusted content from authority-bearing actions. When AI systems are allowed to read broadly and act broadly, the prompt becomes part of the attack surface. Practitioners should treat context ingestion as an identity-adjacent control point, not a usability feature.

Agent privilege is the amplifier that turns semantic abuse into enterprise risk: an injected instruction matters most when the underlying NHI can send email, query records, or call APIs. That is why least privilege and session scoping remain the decisive controls, even in AI workflows. The more autonomous the agent, the more tightly its authority must be bounded.

Shadow AI makes the attack surface harder to see and easier to exploit: unmanaged tools and unsanctioned agent workflows often sit outside normal review, yet they still consume enterprise content and credentials. The governance problem is not limited to approved systems. Security teams need discovery, policy, and monitoring that cover both sanctioned and informal AI usage.

Prompt injection should be treated as an NHI lifecycle problem as much as an AI safety problem: the core failure modes are provisioning, access scope, session duration, and offboarding of machine identities. A strong model guardrail strategy without identity governance leaves the enterprise exposed to indirect control paths. The practical conclusion is simple: control the identity, then control the prompt.

From our research:
72% of organisations have experienced or suspect they have experienced a breach of non-human identities, 46% confirmed, 26% suspected, according to The 2024 ESG Report: Managing Non-Human Identities.
Enterprises that have experienced a compromised NHI averaged 2.7 separate incidents in the past 12 months, which shows how identity weakness often repeats rather than stays isolated.
For lifecycle controls that reduce this exposure, see NHI Lifecycle Management Guide for provisioning, rotation, and offboarding practices.

What this signals

Indirect prompt injection will increasingly look like an identity governance issue, not a pure AI safety problem: as agents gain tool access, the decisive question becomes whether their authority is bounded tightly enough to survive malicious context. The NHI Lifecycle Management Guide is the right operational frame for provisioning, rotation, and offboarding of AI-facing identities. With 45% of employees already using AI tools without IT's knowledge, per LLMjacking: How Attackers Hijack AI Using Compromised NHIs, governance has to cover sanctioned and shadow usage together.

Identity blast radius: the practical measure of AI risk is no longer how smart the model is, but how far its credentials can reach if context is poisoned. Teams should map which agents can write to systems, send messages, or call external APIs, then reduce each one to a constrained blast radius. That programmatic discipline aligns with zero trust thinking and keeps prompt manipulation from becoming enterprise execution.

As agentic workflows spread, security leaders should expect more incidents where the visible abuse is semantic but the root cause is NHI over-privilege. The response is to bring AI connectors, sessions, and service accounts into the same review rhythm used for other privileged access. That shift is now a baseline control expectation, not an advanced maturity marker.

For practitioners

Constrain agent context ingestion Allow AI systems to consume only trusted, allowlisted sources for tasks that can trigger actions or expose sensitive data. Separate external content review from action-bearing workflows so untrusted material cannot directly influence privileged operations.
Reduce agent privilege to task scope Assign each AI agent the minimum set of tools, data sets, and API permissions needed for its job. Remove write permissions unless the workflow explicitly requires them, and use short-lived sessions for any high-risk operation.
Monitor shadow AI and unmanaged agents Inventory AI tools that employees use without approval, then map their access to mail, files, and business systems. Unmanaged agents often bypass review, which makes them a common path for indirect prompt injection abuse.
Build prompt-injection controls into identity reviews Include agent prompts, connectors, and content sources in periodic access reviews. The review should ask whether the agent can reach sensitive systems, whether its sessions expire, and whether its access remains justified after the workflow changes.

Key takeaways

Indirect prompt injection is an access problem as much as a model problem, because malicious content becomes dangerous when it can influence a privileged workflow.
The scale of the issue is already material, with most organisations reporting suspected or confirmed NHI compromise and many employees using AI tools outside IT visibility.
Security teams should combine content controls with strict NHI lifecycle governance so AI agents cannot turn untrusted input into high-impact action.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-01	Prompt injection directly maps to agent misuse and context poisoning risks.
NIST AI RMF		AI RMF GOVERN and MAP apply to agent authority, context trust, and accountability.
NIST Zero Trust (SP 800-207)		Zero trust is relevant because AI agents should not inherit broad implicit trust from context.

Assign ownership for AI agent behaviour and document where untrusted content can influence decisions.

Key terms

Indirect Prompt Injection: Indirect prompt injection is an attack where malicious instructions are hidden inside content that an AI system reads later. The model may treat that content as context rather than as hostile input, which can influence tool use, data access, or workflow actions if controls are weak.
Shadow AI: Shadow AI is the use of AI tools, agents, or automations that are not known to, approved by, or governed by security teams. It expands the attack surface because these systems may still access corporate data, use credentials, and trigger business processes without normal oversight.
Identity Blast Radius: Identity blast radius is the amount of damage an identity can cause if it is misused, compromised, or manipulated. In AI environments, it reflects how far an agent's tokens, sessions, and permissions can reach across data, tools, and external communications.

Deepen your knowledge

Indirect prompt injection and AI agent access control are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance for autonomous workflows or shadow AI, it is worth exploring.

This post draws on content published by CrowdStrike: Indirect Prompt Injection Attacks: A Lurking Risk to AI Systems. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-22.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org