TL;DR: CVE-2026-22778 is a critical vLLM flaw that lets unauthenticated attackers reach remote code execution through a crafted video URL, using an information leak to weaken ASLR and a JPEG2000 heap overflow to gain control, according to Orca Security. The lesson is that AI serving layers need identity and reachability controls, not just patch cadence.
NHIMG editorial — based on content published by Orca Security: CVE-2026-22778 analysis for vLLM
By the numbers:
- CVE-2026-22778 is a critical vulnerability with a CVSS score of 9.8.
Questions worth separating out
Q: What breaks when a public AI serving API can be reached without strong access controls?
A: The service boundary becomes the attack surface.
Q: Why do unauthenticated multimodal endpoints increase exploitation risk?
A: Because they expand the reachable code paths an attacker can trigger remotely.
Q: How can security teams tell whether an AI serving service is actually exposed?
A: Check whether the service accepts requests from untrusted networks, whether multimodal routes are enabled, and whether the deployment depends on application-level keys alone.
Practitioner guidance
- Patch exposed vLLM deployments immediately Upgrade to vLLM 0.14.1 or later on every internet-facing and internal instance that can process video or multimodal requests.
- Remove unreachable multimodal paths Disable video model endpoints anywhere they are not operationally required.
- Constrain API exposure before authentication is assumed Place vLLM behind authentication proxies, private network controls, or VPN access, and do not rely on application-level keys alone.
What's in the full article
Orca Security's full analysis covers the operational detail this post intentionally leaves for the source:
- Exact exploit primitives in the PIL error handling path and the JPEG2000 decoder chain
- Version-specific upgrade instructions for pip installs, Docker images, and source builds
- Network and host detection indicators for spotting probe-plus-payload exploitation attempts
👉 Read Orca Security's analysis of CVE-2026-22778 in vLLM →
vLLM remote code execution risk: are AI serving controls ready?
Explore further
AI serving layers are becoming the new control plane for non-human identity risk. vLLM sits between user requests, model execution, and host resources, which means compromise at that layer can expose data, compute, and downstream workloads at once. This is not just application vulnerability management. It is NHI governance for the systems that authenticate, broker, and execute AI work, and practitioners should treat the serving plane as part of the identity perimeter.
A few things that frame the scale:
- 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, according to the Ultimate Guide to NHIs.
- Only 5.7% of organisations have full visibility into their service accounts, which means most teams cannot reliably see the identities that AI-serving workloads depend on.
A question worth separating out:
Q: Should teams disable video processing if they do not actively use it?
A: Yes. If a workload does not require video support, removing that code path reduces the number of exploitable entry points and shortens the time attackers have to find a reachable flaw. In AI serving, unused decode functionality is still attack surface.
👉 Read our full editorial: vLLM CVE-2026-22778 exposes AI serving stacks to remote code execution