How should security teams reduce indirect prompt injection risk in AI systems?

Why This Matters for Security Teams

Indirect prompt injection is not just a content problem. It is a privilege problem that appears when an AI system can read untrusted text and then act on it with access to tools, data, or workflows it should never touch. For agentic systems, that risk rises because an autonomous OWASP Agentic AI Top 10 pattern often combines model output, tool use, and hidden context in one chain of execution. Security teams should think in terms of blast radius, not just prompt hygiene.

The practical mistake is treating all content as equally safe to process. A malicious email, ticket, webpage, or document can carry instructions that the model may follow if those instructions are not isolated from privileged actions. That is why guidance from the OWASP NHI Top 10 and the NIST Cybersecurity Framework 2.0 both map cleanly to this issue: reduce exposure, constrain privilege, and verify every consequential action. In practice, many security teams encounter this only after an agent has already disclosed data, triggered a workflow, or routed a sensitive request to the wrong place.

How It Works in Practice

The safest pattern is to separate read, reason, and act phases. The AI can inspect untrusted content, but it should not be able to execute sensitive actions directly from that content. Instead, route any high-impact step through explicit policy checks, human approval, or a narrow broker service that only accepts allowlisted commands. That is consistent with current guidance in the OWASP Agentic AI Top 10 and aligns with the control intent in Top 10 NHI Issues.

Restrict what the model can read: filter content, strip instructions from untrusted sources, and sandbox retrieval results.

Limit what the connected agent can do: use least privilege, RBAC, and PAM for tool access, with JIT credentials for sensitive sessions.

Keep secrets short-lived: issue ephemeral tokens per task and revoke them immediately after completion.

Use policy checks at request time: evaluate the action, the data source, and the caller context before any tool invocation.

Require approval for high-risk actions: payments, deletions, external emails, and identity changes should never be implicit.

For agentic systems, intent-based authorisation is more effective than static role design because the same agent may be benign in one task and dangerous in another. Workload identity also matters: the system should prove what the agent is, not just present a password or long-lived key. The control model in OWASP Agentic Applications Top 10 reinforces this need for runtime decisions, not just perimeter rules. These controls tend to break down when agents share broad connector tokens across many tools because one compromised input then inherits the full blast radius of the entire session.

Common Variations and Edge Cases

Tighter control often increases workflow friction, requiring organisations to balance safety against latency and user experience. That tradeoff is real, especially when teams want the model to move quickly across email, chat, ticketing, and code systems. Best practice is evolving, and there is no universal standard for this yet, but the direction is clear: separate low-risk summarisation from high-risk execution, and treat every external input as untrusted until proven otherwise.

Some environments need additional safeguards. Retrieval-augmented generation can be exposed if the knowledge base contains poisoned documents. Browser-using agents can be steered by page content that looks harmless to humans. Multi-agent workflows can amplify the problem when one agent forwards injected instructions to another without validation. The DeepSeek breach and the research in Ultimate Guide to NHIs — Why NHI Security Matters Now show why long-lived secrets and weak containment quickly become enterprise-scale problems. For governance, the NIST Cybersecurity Framework 2.0 remains useful for mapping this to asset protection, monitoring, and response, while Ultimate Guide to NHIs — Key Challenges and Risks helps frame the broader identity controls needed around the agent itself.

Where the model must act on behalf of a user, current guidance suggests treating the agent as a constrained workload with time-bound authority, not as a trusted employee. That means JIT access, ephemeral secrets, and explicit scoping by task. In highly integrated environments, this guidance breaks down when legacy automations cannot support per-request policy evaluation because the system still allows one injected instruction to reach every downstream connector.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Directly addresses prompt injection and tool misuse in agentic systems.
CSA MAESTRO	GOV-02	Covers governance and runtime controls for autonomous agent behaviour.
NIST AI RMF		Supports risk mapping for harmful AI behaviour and control gaps.

Define approval boundaries, task limits, and monitoring before agents can act on external inputs.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams reduce indirect prompt injection risk in AI systems?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group