How can organisations reduce the impact of prompt injections without blocking multimodal use?

Why This Matters for Security Teams

Prompt injection matters because it turns untrusted input into a possible control plane for agentic systems. That risk grows when multimodal workflows combine text, images, documents, and tool calls in a single chain. Security teams often assume the model is only “reading” content, but the real failure mode is that the model can be steered into taking actions with authority it should not have. Current guidance suggests treating this as an execution-risk problem, not just a content-safety problem, as reflected in the OWASP Agentic AI Top 10 and NHI governance research from NHI Mgmt Group.

The practical issue is authority leakage. If the same workflow that interprets an image can also update a ticket, send a message, or query sensitive records, then malicious instructions hidden in the input may influence downstream decisions. That is why limiting model authority is more effective than trying to “detect” every injection. In practice, many security teams encounter abuse only after an agent has already chained from untrusted input into an irreversible action.

How It Works in Practice

The safest pattern is to separate perception from privilege. Multimodal inputs should be parsed in a low-trust layer, then passed forward as structured, bounded outputs rather than as direct instructions. A model can summarise an image, extract fields from a document, or classify a screenshot, but those results should not automatically inherit permission to act. For action-bearing steps, use explicit policy checks, task-specific approval, and short-lived credentials. This aligns with emerging agent guidance in the OWASP Agentic AI Top 10 and the NHI lifecycle concerns discussed by NHI Mgmt Group.

In operational terms, the workflow should look like this:

Ingest untrusted text or multimodal content in a sandboxed step with no direct tool access.

Convert model output into typed data, not free-form instructions.

Evaluate each requested action at runtime against policy, context, and user intent.

Issue just-in-time credentials only for the specific task, scope, and duration needed.

Log the full chain from input to decision so investigators can reconstruct how the action was produced.

Where possible, keep sensitive retrieval and write operations behind separate services with their own identities and approval gates. This reduces the chance that a hidden prompt inside a PDF, image, email, or chat message can cause a privileged side effect. These controls tend to break down when teams let a general-purpose agent hold broad tool access across multiple systems because the model can then combine benign-looking steps into a sensitive action path.

Common Variations and Edge Cases

Tighter isolation often increases latency and review overhead, so organisations must balance usability against containment. That tradeoff becomes visible in workflows where multimodal interpretation is central to the business process, such as document processing, customer support, or security triage. Best practice is evolving, but there is no universal standard for how much trust to grant a model after it has processed untrusted content. The safer default is to treat the output as advisory until a separate policy engine or human reviewer approves it.

Some environments can use lower-friction controls for low-risk actions, such as read-only retrieval or draft generation, while reserving human approval for record changes, payments, outbound communications, and access to sensitive data. Organisations should also be careful not to rely on prompt filters alone, because injection can arrive through images, OCR text, attachments, or tool outputs. The broader NHI problem is often underestimated as well: NHI Mgmt Group reports that 97% of NHIs carry excessive privileges, which makes authority reduction a practical necessity, not an optional hardening step. In edge cases like multi-agent pipelines, the safest assumption is that each agent can be manipulated unless its identity, scope, and permissions are independently constrained.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection is a core agentic-app risk tied to unsafe tool use.
CSA MAESTRO	A1	MAESTRO addresses agent trust boundaries and unsafe autonomous execution.
NIST AI RMF		AI RMF supports governing misuse and harmful outcomes in AI systems.

Isolate untrusted inputs from actions and gate every tool call with policy checks.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How can organisations reduce the impact of prompt injections without blocking multimodal use?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group