TL;DR: Trail of Bits showed that malicious instructions hidden in images can survive resizing and trigger unintended tool calls in systems such as Gemini CLI and Google Assistant, including a proof of concept that exfiltrated Google Calendar data to an external address. The control problem is no longer text-only prompt injection, because multimodal inputs can weaponise trusted workflows without visible malware or obvious system alerts.
At a glance
What this is: This is a ZioSec analysis of image-based prompt injection that shows how hidden instructions in a seemingly harmless image can cause an LLM to execute unintended actions and leak data.
Why it matters: It matters because security teams are now governing AI-connected workflows, not just models, and poisoned multimodal inputs can turn calendars, mail, and ticketing tools into untrusted execution paths.
By the numbers:
- 96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.
👉 Read ZioSec's analysis of image-based prompt injection in AI workflows
Context
Multimodal prompt injection is a control failure in AI workflows, not a model bug alone. When an LLM can ingest images and then act on them by calling tools, resizing and parsing become part of the trust boundary, and hidden instructions can ride inside otherwise ordinary content.
For identity and access teams, the key issue is not whether the image is malicious in the traditional malware sense. The issue is whether the connected AI system can separate untrusted input from privileged action when the model sits inside calendars, mail, CRM, or collaboration workflows.
That makes this a governance problem for AI agents and AI-assisted systems that can reach real business tools. The starting assumption that user content is passive no longer holds once the system can interpret pixels as instructions and execute them downstream.
Key questions
Q: How should security teams handle image-based prompt injection in AI workflows?
A: Treat images as untrusted inputs, not passive media. Security teams should inspect transformed content, restrict sensitive tool access, and require policy checks before an AI can forward data or trigger actions. If the model can reach mail, calendars, or ticketing systems, input handling and execution controls must be designed together.
Q: Why do multimodal AI systems create more risk than text-only chatbots?
A: Multimodal systems expand the attack surface because hidden instructions can arrive through images, audio, or video and survive preprocessing. That matters when the AI can act on connected tools. The risk is not just bad output, but unauthorised side effects in privileged business workflows.
Q: What breaks when AI tools trust user-uploaded images too much?
A: The trust boundary breaks down. A harmless-looking image can carry instructions that the model treats as part of the prompt, causing unintended tool calls or data exposure. Once that happens, the system is no longer separating user content from execution decisions, which is the control failure attackers exploit.
Q: How do organisations reduce the impact of poisoned multimodal prompts?
A: Use layered controls: sanitise inputs, constrain tool permissions, and log every model-to-tool action for review. The goal is to stop hidden instructions from becoming privilege-bearing actions. If the workflow needs high-trust actions, add explicit confirmation before execution, not after the fact.
Technical breakdown
How image resizing becomes a prompt injection channel
The attack relies on the way many platforms preprocess images before inference. Downscaling with bicubic interpolation can expose patterns that are invisible at the original resolution, effectively turning a harmless-looking picture into readable text for the model. That text is then treated as part of the prompt context, which means the model cannot reliably distinguish user intent from embedded instructions. The result is prompt piggybacking: the malicious payload travels alongside legitimate content and survives normal file handling steps.
Practical implication: treat image preprocessing as an input-validation step and inspect transformed outputs before they reach privileged AI workflows.
Why tool-connected LLMs amplify the risk
The risk rises sharply when the model is connected to tools such as calendar, email, or ticketing APIs. In that architecture, the model is not just interpreting content, it is deciding whether to act on it. If a poisoned image can influence the model’s interpretation of the task, the tool call becomes the impact point. This is why multimodal injection is more than a content-safety issue. It becomes an authorisation problem once the model can forward data, schedule events, or trigger actions in systems that hold real operational privileges.
Practical implication: separate content interpretation from action execution and require explicit policy checks before any sensitive tool call.
Why traditional security controls miss the abuse path
Conventional endpoint, firewall, and IDS controls are poorly positioned here because the exploit does not look like classic malware. There is no executable dropped on disk and no obvious process injection on the host. The abuse happens inside the conversation and the model runtime, where normal security telemetry often has little visibility. That leaves a gap between what the infrastructure sees and what the AI system decides to do. In practice, the most important signal is not file reputation but whether the AI is allowed to take side effects from untrusted multimodal input.
Practical implication: add model-aware logging, action approval gates, and multimodal sanitisation controls at the AI application layer.
Threat narrative
Attacker objective: The attacker wants the AI system to execute hidden instructions as if they were legitimate user intent, then use connected tools to leak data or perform unauthorised actions.
- Entry occurs when an attacker supplies a poisoned image that appears benign to a human reviewer but contains instructions that emerge during resizing or preprocessing.
- Escalation occurs when the LLM incorporates the hidden instructions into its working context and uses its connected tools to forward data or take an unintended action.
- Impact occurs when privileged workflows such as calendars, email, or collaboration systems are used to exfiltrate information or trigger unauthorised side effects.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Pixel poison is an input-trust failure, not a model hallucination problem. The attack works because systems treat transformed image content as trustworthy prompt material after resizing. That assumption was built for static documents and human review, not for machine-read multimodal inputs. The implication is that AI governance must treat preprocessing as part of the attack surface, not a harmless utility layer.
Tool-connected AI turns hidden instructions into privilege abuse. Once a model can call calendar, email, or workflow APIs, the real risk shifts from prompt corruption to action execution. Multimodal injection matters because it crosses the boundary from content handling into identity and access. Practitioners should view every connected tool as a potential privilege channel that can be reached through untrusted input.
Least privilege is not enough when the model can be steered at runtime. A model that receives untrusted images and then chooses actions is operating inside a decision loop that classic IAM never governed. That means access review, approval gates, and policy enforcement need to account for model-mediated execution paths, not just the human user who uploaded the file.
Multimodal prompt injection expands the AI governance gap beyond text-only controls. Security programmes that only test chat prompts will miss the abuse path in images, audio, and other non-text inputs. OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both point toward stronger input, output, and action governance. Practitioners should align AI control design with the full multimodal attack surface.
The named concept here is prompt piggybacking. Malicious instructions ride inside legitimate-looking content and inherit the trust of the surrounding workflow. That makes the hidden payload harder to detect and easier to operationalise at scale. The practical conclusion is that content origin, transformation, and execution must be governed as one chain.
From our research:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
- OWASP Agentic Applications Top 10 is the next resource to review when you are mapping agentic and multimodal attack paths to practical controls.
What this signals
Prompt piggybacking is becoming a governance pattern, not an edge case. As AI systems move from text-only interfaces into image, audio, and workflow integrations, the security question shifts from whether the model understands the content to whether it should be allowed to act on it. The organisations that will cope best are the ones that design for untrusted multimodal input from the start, especially where agentic application paths can reach real business systems.
The practical signal for teams is that AI audit trails need to capture preprocessing, prompt composition, and tool execution as a single chain. Without that, investigations will miss how a benign-looking file became a side effect in mail or calendar systems. That is a gap in The 52 NHI breaches Report as much as it is in model security.
Hidden-in-content attacks also widen the identity conversation. Once an AI can use trusted enterprise credentials to act on hidden instructions, the question becomes who authorised the action path, not just who uploaded the file. That is why governance teams should align multimodal controls with NIST AI Risk Management Framework guidance on measuring, managing, and documenting AI risk before the attack surface scales further.
For practitioners
- Classify multimodal inputs as untrusted payloads Apply the same scrutiny to images, audio, and video that you already use for file uploads and external documents. Inspect the transformed, downscaled, or OCR-processed version before any AI system can act on it.
- Separate interpretation from execution Keep the model’s reading of content distinct from its ability to send email, update calendars, or create tickets. Route sensitive actions through explicit policy checks and human approval where side effects matter.
- Instrument model-aware audit logging Log the original input, any preprocessing steps, the prompt context, and the resulting tool call so you can trace how a poisoned image became an action. Traditional endpoint telemetry will not show the full chain.
- Harden multimodal guardrails at the application layer Limit image dimensions, validate transformations, and block hidden-text patterns before the model receives the payload. Pair that with allow-listed tools and policy enforcement around calendar, mail, and CRM integrations.
Key takeaways
- Image-based prompt injection turns ordinary preprocessing into a security boundary that attackers can exploit.
- The main risk is not the image itself, but the connected tool action the model can be steered into taking.
- Organisations need layered multimodal controls, explicit execution policy, and audit logs that capture the full input-to-action chain.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Covers multimodal prompt injection and tool abuse in agentic systems. | |
| NIST AI RMF | Addresses governance, mapping, and monitoring for AI risk in connected workflows. | |
| NIST CSF 2.0 | PR.AA-1 | Identity and access awareness is needed where AI can trigger business actions. |
Bind AI tool permissions to least-privilege policy and review them as part of access governance.
Key terms
- Multimodal Prompt Injection: A prompt injection attack that arrives through non-text inputs such as images, audio, or video. The malicious instruction is hidden in content the system treats as ordinary user input, then surfaced by preprocessing or model interpretation and acted on by downstream tools or workflows.
- Prompt Piggybacking: A technique where malicious instructions travel inside legitimate content and inherit the trust of the surrounding workflow. The model reads the payload as part of the user request, which makes the abuse harder to spot and more likely to reach a privileged action path.
- Tool-Connected LLM: An LLM that can do more than generate text because it is wired to external systems such as email, calendars, ticketing, or CRMs. The security risk changes materially once the model can trigger side effects, because prompt manipulation can become access abuse.
- Preprocessing Trust Boundary: The point where raw user content is transformed before a model sees it, such as resizing, OCR, or format conversion. In multimodal AI, this boundary matters because the transformation itself can reveal or create attacker-controlled instructions that later influence execution.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.
This post draws on content published by ZioSec: Anamorpher: How LLMs Are Compromised With An Image. Read the original.
Published by the NHIMG editorial team on 2025-09-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org