How should security teams handle prompt injection in AI systems?

Why This Matters for Security Teams

Prompt injection is not just a malformed prompt or a bad user message. In agentic systems, it is a control bypass attempt aimed at the decision layer that governs tools, data, and actions. That is why current guidance increasingly treats it as an authorisation issue, not merely a content moderation problem. The attack surface expands fast when an AI agent can read emails, query internal systems, or call APIs on behalf of a user.

The practical risk is well captured in the OWASP Agentic AI Top 10 and in NHIMG’s coverage of the OWASP Agentic Applications Top 10, both of which frame malicious instructions as a path to unsafe tool use, data exposure, and policy override. Teams also underestimate how quickly exposed secrets turn into real abuse; NHIMG research on the DeepSeek breach shows how secret sprawl and exposed data can amplify downstream AI risk.

In practice, many security teams encounter prompt injection only after an agent has already queried a sensitive system, rather than through intentional testing of the tool chain.

How It Works in Practice

Defence starts by separating untrusted language from trusted execution. The model may parse the prompt, but it should not directly decide whether a tool call, secret retrieval, or data export is allowed. Instead, every sensitive action needs a runtime policy check that evaluates the request context, the caller identity, the target resource, and the agent’s current task. That is where intent-based authorisation is becoming more useful than static RBAC, because autonomous agents do not follow one predictable path.

For practical implementation, teams should combine input validation, tool allowlisting, output filtering, and explicit policy enforcement. A secure design usually includes:

JIT credential provisioning so the agent gets only the access needed for the current task.

Short-lived secrets and workload identity so access can be revoked quickly if the agent is manipulated.

Policy-as-code checks at the tool gateway, not only in the application layer.

Segmentation between the model, orchestration layer, and production systems.

Logging that captures prompt, tool call, policy decision, and resulting action for incident review.

This aligns with the direction in the OWASP Agentic AI Top 10, and with NHI guidance that treats identity, secrets, and privilege as the real choke points. The key lesson from NHIMG’s OWASP Agentic Applications Top 10 is that a prompt can become an attack only when the surrounding control plane is willing to trust it. These controls tend to break down in multi-agent environments with shared memory and broad API permissions because one compromised agent can chain tool calls across systems.

Common Variations and Edge Cases

Tighter prompt and tool controls often increase latency and operational overhead, so organisations have to balance safety against usability and automation speed. There is no universal standard for this yet, especially where teams want autonomous agents to complete multi-step work without constant human review.

One common edge case is retrieval-augmented generation. If an agent can search internal documents, prompt injection may hide in content that looks legitimate to the model. Another is user-supplied files, which can carry instructions that override the agent’s task unless file content is sandboxed and treated as hostile input. A third is multi-agent workflows, where one agent inherits compromised context from another and repeats the attack across systems.

Current guidance suggests treating the model as untrusted even when the user is authenticated, because authentication does not equal authorisation. That is also where NIST’s AI Risk Management Framework and the OWASP Agentic AI Top 10 converge on the need for governance, measurement, and control validation. The practical exception is low-risk, read-only assistants with no tool access, where prompt injection is still a concern but the blast radius is much smaller.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection is a core agentic app threat tied to unsafe tool use.
CSA MAESTRO		MAESTRO covers agent trust boundaries and runtime control for autonomous systems.
NIST AI RMF		AI RMF supports governance of trustworthy, monitored AI behaviour.

Define ownership, monitor misuse, and test agent controls before production release.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams handle prompt injection in AI systems?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group