Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Image-based prompt injection: are your AI controls keeping up?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 9079
Topic starter  

TL;DR: Trail of Bits showed that malicious instructions hidden in images can survive resizing and trigger unintended tool calls in systems such as Gemini CLI and Google Assistant, including a proof of concept that exfiltrated Google Calendar data to an external address. The control problem is no longer text-only prompt injection, because multimodal inputs can weaponise trusted workflows without visible malware or obvious system alerts.

NHIMG editorial — based on content published by ZioSec: Anamorpher: How LLMs Are Compromised With An Image

By the numbers:

Questions worth separating out

Q: How should security teams handle image-based prompt injection in AI workflows?

A: Treat images as untrusted inputs, not passive media.

Q: Why do multimodal AI systems create more risk than text-only chatbots?

A: Multimodal systems expand the attack surface because hidden instructions can arrive through images, audio, or video and survive preprocessing.

Q: What breaks when AI tools trust user-uploaded images too much?

A: The trust boundary breaks down.

Practitioner guidance

  • Classify multimodal inputs as untrusted payloads Apply the same scrutiny to images, audio, and video that you already use for file uploads and external documents.
  • Separate interpretation from execution Keep the model’s reading of content distinct from its ability to send email, update calendars, or create tickets.
  • Instrument model-aware audit logging Log the original input, any preprocessing steps, the prompt context, and the resulting tool call so you can trace how a poisoned image became an action.

What's in the full article

ZioSec's full blog post covers the operational detail this post intentionally leaves for the source:

  • The exact Anamorpher image-generation approach used to surface hidden instructions during resizing.
  • The proof-of-concept workflow that caused Google Calendar data to be sent to an external email address.
  • The specific AI surfaces tested, including Gemini CLI, Vertex AI Studio, Google Assistant, and Gemini web.
  • The defensive ideas Trail of Bits discussed for previewing downscaled images and restricting sensitive actions.

👉 Read ZioSec's analysis of image-based prompt injection in AI workflows →

Image-based prompt injection: are your AI controls keeping up?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 2 months ago
Posts: 8508
 

Pixel poison is an input-trust failure, not a model hallucination problem. The attack works because systems treat transformed image content as trustworthy prompt material after resizing. That assumption was built for static documents and human review, not for machine-read multimodal inputs. The implication is that AI governance must treat preprocessing as part of the attack surface, not a harmless utility layer.

A few things that frame the scale:

  • 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
  • Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.

A question worth separating out:

Q: How do organisations reduce the impact of poisoned multimodal prompts?

A: Use layered controls: sanitise inputs, constrain tool permissions, and log every model-to-tool action for review. The goal is to stop hidden instructions from becoming privilege-bearing actions. If the workflow needs high-trust actions, add explicit confirmation before execution, not after the fact.

👉 Read our full editorial: Multimodal prompt injection turns images into AI tool abuse



   
ReplyQuote
Share: