TL;DR: Visual prompt injections embed malicious instructions inside images so multimodal models can be induced to ignore their original task, change outputs, or suppress nearby content, according to Lakera’s analysis. The governance gap is no longer just prompt hygiene, but control over what an AI system can do after it has already seen the image.
NHIMG editorial — based on content published by Lakera: The Beginner's Guide to Visual Prompt Injections
Questions worth separating out
Q: How should security teams test for visual prompt injection in multimodal AI systems?
A: Test the full input path, not just the model output.
Q: Why do multimodal AI systems create new governance risks for identity teams?
A: Because the system can be reached legitimately and still be manipulated after access is granted.
Q: What breaks when image inputs are allowed to influence tool use in AI workflows?
A: The application loses a reliable separation between perception and action.
Practitioner guidance
- Separate untrusted image content from protected instructions Place multimodal prompt construction behind a trust boundary so image-derived text cannot be treated as system or developer instructions.
- Constrain tool use for multimodal models Limit which actions a model can trigger after reading an image, especially when those actions involve search, retrieval, messaging, or content suppression.
- Add adversarial image testing to red-team exercises Test whether off-white text, hidden instructions, overlays, and caption-like artefacts can steer outputs or suppress recognition.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- Worked examples of invisibility-cloak style injections and how the model responded
- Image-based prompt injection variants that change identity recognition and content suppression
- Defence ideas for multimodal systems, including the visual prompt injection detector the vendor is building
- Related reading links on prompt injection, content moderation, and AI red teaming
👉 Read Lakera's analysis of visual prompt injection in multimodal AI →
Visual prompt injections: are your multimodal controls keeping up?
Explore further
Visual prompt injection is an execution-layer problem, not just a prompt problem. The article shows that malicious instructions hidden in images can change model output without changing the application code or the user-facing prompt. That means the trust boundary is inside the inference path, where untrusted content and system intent are already colliding. Practitioners should treat multimodal input as a governed execution surface, not a harmless content type.
A few things that frame the scale:
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities.
A question worth separating out:
Q: How can organisations reduce the impact of prompt injections without blocking multimodal use?
A: Limit the model's authority. Keep image interpretation separate from privileged actions, log the chain from input to decision, and require human approval for any step that changes records, sends messages, or accesses sensitive data. The objective is to preserve multimodal capability while preventing untrusted content from becoming execution.
👉 Read our full editorial: Visual prompt injections expose a new control gap in multimodal AI