vLLM CVE-2026-22778 exposes AI serving stacks to remote code execution

By NHI Mgmt Group Editorial TeamPublished 2026-02-03Domain: Breaches & IncidentsSource: Orca Security

TL;DR: CVE-2026-22778 is a critical vLLM flaw that lets unauthenticated attackers reach remote code execution through a crafted video URL, using an information leak to weaken ASLR and a JPEG2000 heap overflow to gain control, according to Orca Security. The lesson is that AI serving layers need identity and reachability controls, not just patch cadence.

At a glance

What this is: A critical vLLM vulnerability lets unauthenticated attackers trigger remote code execution through a crafted video URL, with the key issue being a chained leak-plus-overflow exploit.

Why it matters: It matters because AI serving stacks often sit on high-value infrastructure, and a single exposed inference service can become a foothold for broader NHI, workload, and platform compromise.

By the numbers:

CVE-2026-22778 is a critical vulnerability with a CVSS score of 9.8.

👉 Read Orca Security's analysis of CVE-2026-22778 in vLLM

Context

vLLM is the serving layer that turns model requests into production inference, so a flaw in that layer becomes an identity and reachability problem as much as a software bug. CVE-2026-22778 matters because unauthenticated network access to an AI serving API can expose the service account, data plane, and adjacent workloads to abuse.

The core governance gap is simple: teams often secure the model but under-secure the service that brokers access to it. Once an attacker can reach a multimodal endpoint, the service becomes part of the NHI attack surface, especially where API validation, network exposure, and privilege boundaries are loose.

Key questions

Q: What breaks when a public AI serving API can be reached without strong access controls?

A: The service boundary becomes the attack surface. If an exposed inference API can reach decoding logic before strong authentication and network segmentation take effect, attackers can chain application flaws into host compromise. In AI serving environments, that means the platform itself becomes a non-human identity risk, not just the model behind it.

Q: Why do unauthenticated multimodal endpoints increase exploitation risk?

A: Because they expand the reachable code paths an attacker can trigger remotely. When video or image processing is available over a public API, a flaw in the decode stack can be exercised without user interaction, and any information leak that precedes memory corruption makes reliable exploitation much easier.

Q: How can security teams tell whether an AI serving service is actually exposed?

A: Check whether the service accepts requests from untrusted networks, whether multimodal routes are enabled, and whether the deployment depends on application-level keys alone. If the API can be reached directly and decode logic is live, the exposure is operational, not theoretical.

Q: Should teams disable video processing if they do not actively use it?

A: Yes. If a workload does not require video support, removing that code path reduces the number of exploitable entry points and shortens the time attackers have to find a reachable flaw. In AI serving, unused decode functionality is still attack surface.

Technical breakdown

Information leak that weakens ASLR

The first stage is an error-handling flaw. When vLLM passes an invalid image to PIL, the exception can leak a heap address back to the caller. That address disclosure makes Address Space Layout Randomization far less effective because the attacker no longer has to guess memory layout blindly. In practice, the leak turns exploitation from uncertain to repeatable. This is a classic example of why memory disclosure bugs matter even when they look like harmless error messages: they often remove the last barrier that makes a second-stage memory corruption bug hard to weaponise.

Practical implication: suppress low-level memory details in API error paths and treat information disclosure as an exploit-enabler, not a cosmetic defect.

JPEG2000 heap overflow in the video decode path

The second stage is memory corruption in the video-processing stack. vLLM uses OpenCV and FFmpeg to decode video, and the JPEG2000 path can be abused through a crafted channel-definition box that remaps data into the wrong buffer. Because the luma plane is larger than the chroma plane, writing Y-channel data into U-channel storage creates a heap overflow. The exploit becomes reliable when the attacker can shape the overflow and target a function pointer or adjacent heap object. The important point is that the vulnerable path is only reachable when video or multimodal support is exposed.

Practical implication: disable unreachable multimodal paths and restrict exposure of decode-capable endpoints to reduce the reachable attack surface.

Remote code execution through the API request path

The attack flow is remote and unauthenticated. An attacker submits a request to a chat or invocation endpoint, first using a probe that leaks memory and then sending malicious video content that triggers the overflow. If the overwrite lands correctly, execution can jump to system-level commands on the host. That makes the serving layer a direct bridge from API access to host compromise. For identity teams, the lesson is that API reachability is itself a privilege boundary. If that boundary is weak, a model-serving workload can become an enterprise foothold.

Practical implication: place inference APIs behind authentication, network segmentation, and workload isolation before they are exposed to untrusted clients.

Threat narrative

Attacker objective: The attacker aims to turn a public AI serving API into host-level code execution and then use that foothold to move into adjacent infrastructure or steal sensitive model and prompt data.

Entry occurs when an unauthenticated attacker reaches a vLLM chat or invocation endpoint that accepts a crafted video URL.
Credential-like control is not the issue here, because the attacker abuses reachable decode logic and uses the leak to make memory targeting reliable.
Escalation follows when the JPEG2000 overflow overwrites a control pointer and redirects execution to attacker-chosen commands.
Impact is host-level remote code execution on the AI serving node, with potential follow-on access to connected data and workloads.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI serving layers are becoming the new control plane for non-human identity risk. vLLM sits between user requests, model execution, and host resources, which means compromise at that layer can expose data, compute, and downstream workloads at once. This is not just application vulnerability management. It is NHI governance for the systems that authenticate, broker, and execute AI work, and practitioners should treat the serving plane as part of the identity perimeter.

Reachable inference endpoints are privilege boundaries, not convenience features. The article shows that unauthenticated access to a video-capable API can become a direct route to remote code execution. That changes the security question from

From our research:
80% of identity breaches involved compromised non-human identities such as service accounts and API keys, according to the Ultimate Guide to NHIs.
Only 5.7% of organisations have full visibility into their service accounts, which means most teams cannot reliably see the identities that AI-serving workloads depend on.
For broader context on breach patterns, review The 52 NHI breaches Report for how credential exposure and service-account abuse typically unfold.

What this signals

Identity blast radius: AI serving platforms concentrate trust, compute, and data access into a single runtime layer, so compromise there can spread faster than conventional application bugs. Teams should map inference services as privileged workloads and review which service accounts, storage locations, and internal APIs they can reach.

The practical shift is toward exposure-aware governance. If a model-serving node is reachable from untrusted networks, the issue is not just patching one CVE. It is whether the environment has enough visibility, segmentation, and workload isolation to keep an exposed AI service from becoming a lateral-movement foothold.

For practitioners

Patch exposed vLLM deployments immediately Upgrade to vLLM 0.14.1 or later on every internet-facing and internal instance that can process video or multimodal requests. Confirm the fixed version in containers, source builds, and package installs, then verify that rollback images cannot reintroduce the vulnerable branch.
Remove unreachable multimodal paths Disable video model endpoints anywhere they are not operationally required. If the decode path is not needed, it should not be reachable at all, because reachability is what makes the exploit chain practical.
Constrain API exposure before authentication is assumed Place vLLM behind authentication proxies, private network controls, or VPN access, and do not rely on application-level keys alone. The attack described in the advisory can reach the vulnerable path before authentication meaningfully reduces risk.
Monitor for the two-stage exploit pattern Alert on sequential requests that first probe with invalid image input and then deliver external video URLs to the same source. Pair that with host telemetry for PIL exceptions containing memory addresses and unexpected child processes from the vLLM worker.

Key takeaways

CVE-2026-22778 turns a public AI serving API into a remote code execution path when video processing is reachable and unprotected.
The exploit chain combines an information leak with a heap overflow, which makes exploitation much more reliable than a single memory bug would be.
The strongest limiting control is reducing reachability, especially by removing unused multimodal paths and tightening network exposure around inference workloads.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	This flaw exposes reachable non-human identity surfaces in an AI serving workload.
NIST CSF 2.0	PR.AC-4	The article centers on whether access paths are sufficiently restricted for a privileged service.
NIST Zero Trust (SP 800-207)	SC-7	Zero trust segmentation is directly relevant to exposed model-serving endpoints.

Restrict inference APIs behind segmentation and authentication, then verify only intended callers can reach them.

Key terms

AI Serving Layer: The AI serving layer is the runtime service that accepts prompts, routes requests, and returns model output in production. It matters because this layer often holds network reachability, authentication logic, and access to downstream data or compute, making it a privileged non-human identity boundary rather than a simple application wrapper.
Information Leak: An information leak is a disclosure flaw that reveals data the caller should not see, such as memory addresses, tokens, or internal state. In exploitation chains, leaks often do the quiet work of making a second-stage bug practical by removing randomness, exposing layout, or confirming that a target is reachable.
Heap Overflow: A heap overflow occurs when a program writes beyond the bounds of a dynamically allocated memory buffer. In practice, attackers use it to corrupt adjacent objects, overwrite function pointers, or redirect execution, which is why heap corruption in decode paths is treated as a high-impact memory safety issue.

Deepen your knowledge

AI serving exposure and non-human identity boundaries are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are securing inference workloads or multimodal endpoints, it is a practical next step.

This post draws on content published by Orca Security: CVE-2026-22778 analysis for vLLM. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-02-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org