SGLang pickle RCE shows how AI serving stacks fail

By NHI Mgmt Group Editorial TeamPublished 2026-03-12Domain: Breaches & IncidentsSource: Orca Security

TL;DR: Three unsafe deserialization flaws in SGLang, including two unauthenticated remote code execution paths that trigger when exposed multimodal or disaggregation features accept network input, plus a third crash-dump replay issue tied to malicious .pkl files, were found by Orca Security. The broader lesson is that AI serving frameworks still treat untrusted bytes as trusted control flow, which makes runtime trust boundaries the real security control.

At a glance

What this is: Orca Security identified three SGLang deserialization flaws, including two network-reachable unauthenticated RCE paths and one local code execution issue in a crash-dump replay utility.

Why it matters: This matters because AI serving frameworks often sit inside shared cloud and Kubernetes environments, so a single deserialization bug can turn model-serving access into workload compromise, secret exposure, and cluster pivot risk.

By the numbers:

The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities.

👉 Read Orca Security's analysis of SGLang pickle-based RCE paths and AI workload exposure

Context

Unsafe deserialization is a trust boundary failure. In AI serving stacks, that boundary sits between network input, internal broker traffic, and the code that reconstructs objects for execution. SGLang’s issue is not simply that it uses pickle. The deeper problem is that it accepts untrusted bytes in paths that can execute inside production inference infrastructure, which turns a data format choice into a control-plane risk for AI workloads.

For IAM and platform teams, this is an NHI problem as much as an application bug. Service processes, brokers, and GPU inference jobs are all non-human identities with privileges that can reach model assets, credentials, and adjacent cluster services. When deserialization becomes execution, the security question shifts from whether the app is authenticated to whether the workload identity can be trusted to interpret external input safely.

Key questions

Q: What breaks when AI serving frameworks deserialize untrusted network data?

A: The trust boundary collapses before the application can validate the request. If the framework uses pickle or another executable object format, the payload can invoke attacker-chosen functions during parsing, which turns a message handler into a code execution path. In practice, this can expose model data, credentials, and adjacent workloads. The right control is to remove executable deserialization from exposed interfaces and use schema-based parsing instead.

Q: Why do AI serving brokers create hidden NHI risk in Kubernetes and cloud environments?

A: Because they are privileged non-human identities that often listen on internal ports and handle sensitive model traffic. If the broker accepts untrusted bytes and deserializes them immediately, the workload identity becomes the attacker’s execution target. That can lead to lateral movement, secret exposure, or cluster pivoting. Teams should treat broker ports as identity boundaries, not just transport endpoints.

Q: How do security teams know whether an inference stack is exposed to deserialization abuse?

A: Check for any path that accepts external or semi-trusted .pkl, pickle, or equivalent serialized objects, especially in network brokers, replay scripts, job queues, and admin utilities. Then verify whether the code validates format, authenticates source, and restricts reachability before parsing. If deserialization happens before policy enforcement, the stack is exposed. Monitoring should also look for shells, unexpected child processes, and unusual outbound connections.

Q: Which frameworks are most relevant when governing unsafe deserialization in AI workloads?

A: OWASP NHI guidance applies because the vulnerable component is a non-human identity with privileged execution. NIST Cybersecurity Framework 2.0 is relevant for asset visibility, protection, detection, and response, while zero trust principles apply to internal service reachability. If the stack uses agentic components, OWASP Agentic AI guidance can help model tool and execution boundaries, but the core issue here is workload trust.

Technical breakdown

Python pickle and why untrusted input becomes code execution

Python pickle is not a data-only format. It stores instructions for reconstructing objects, including which callable to invoke during deserialization. That makes pickle.loads() on untrusted input equivalent to executing attacker-controlled object construction logic. In SGLang, the broker and replay utility accept raw payloads and immediately deserialize them without schema validation, authentication, or allowlisting. The vulnerability class is longstanding in Python AI and ML tooling because convenience often overrides security boundaries. Once a payload reaches pickle, the application has already lost control of what code paths will be invoked.

Practical implication: Treat any network- or file-supplied pickle as an execution boundary and remove it from exposed AI serving paths.

ZMQ broker exposure in multimodal and disaggregation paths

The two critical SGLang CVEs follow the same pattern: a ZMQ broker binds to all interfaces and forwards received bytes straight into pickle.loads(). The attack surface only exists when multimodal generation or disaggregation features are enabled, but when they are enabled the broker becomes network-reachable with no authentication gate. That means the deserialization step is not protected by a higher-level request layer. The result is a single-message remote code execution path from any reachable client that can speak ZMQ. This is a classic exposed service account problem in application form, where the service assumes the network is a trust signal.

Practical implication: Restrict broker reachability to localhost or tightly controlled internal clients and disable unused multimodal or disaggregation modules.

Crash-dump replay utilities as a second execution path

The third CVE shows that the same unsafe pattern extends beyond live network traffic into debugging workflows. A replay utility loads .pkl crash dumps and then processes them as if they were trusted recovery data. That creates a social-engineering and file-integrity problem: if an attacker can plant a malicious dump file or persuade an operator to replay one, execution happens during incident handling or troubleshooting. This is why AI and ML toolchains need the same security scrutiny applied to admin utilities, not just public APIs. Debug paths often become privileged shortcuts that bypass normal controls.

Practical implication: Quarantine crash-dump files, verify provenance before replay, and block operators from loading untrusted .pkl artifacts.

Threat narrative

Attacker objective: The attacker wants arbitrary code execution inside the SGLang process so they can steal data, manipulate workloads, or move deeper into the environment.

Entry occurs when an attacker reaches the exposed ZMQ broker on the network or plants a malicious .pkl file in a replay workflow.
Credential access is not required because the vulnerable paths accept unauthenticated payloads and immediately deserialize attacker-controlled bytes.
Escalation happens as pickle invokes attacker-chosen callables inside the SGLang process, granting arbitrary code execution in the workload context.
Impact follows when the compromised inference process can expose model assets, credentials, GPU workloads, or pivot into the surrounding cluster.

ASP.NET machine keys RCE attack — 3,000+ exposed ASP.NET machine keys enabled remote code execution.
Codefinger AWS S3 ransomware attack — Codefinger used compromised AWS credentials to encrypt S3 buckets via SSE-C.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Unsafe deserialization in AI serving is an NHI governance failure, not just a code defect. The broker, encoder receiver, and replay utility are all non-human identities making trust decisions on behalf of production systems. Once those identities accept untrusted input as executable state, the governance problem becomes one of workload trust, not only application hardening. The practitioner conclusion is that AI serving paths must be governed as privileged NHI execution surfaces.

Trusted network input was designed for bounded internal communication, and that assumption fails when the actor is a remotely reachable service. SGLang’s ZMQ paths assume that anything arriving on the broker socket is already safe enough to deserialize. That assumption fails because the payload itself carries execution instructions, not just data. The implication is that network reachability cannot be treated as a substitute for object-level trust validation.

Ephemeral inference traffic creates identity blast radius when deserialization happens before policy enforcement. The dangerous moment is not login, because there is no meaningful login here. The dangerous moment is the first byte stream handed to pickle.loads() inside a privileged workload identity. That is a named concept worth carrying forward: deserialization blast radius. It describes how a single parsing step can convert routine inference traffic into full process compromise, which means practitioners must treat object parsing as part of identity governance.

Crash-dump replay is a lifecycle control problem for NHI artifacts, not a convenience script problem. The replay utility shows that offboarding, provenance, and handling rules matter for files as much as for active secrets. A dump file can outlive the trust relationship that created it and still execute in an operator context. The practitioner conclusion is that NHI lifecycle governance must extend to generated artifacts and debug pipelines, not stop at running services.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control, according to The State of Secrets in AppSec.
For the broader attack-pattern view, read The 52 NHI breaches Report for real-world examples of exposed non-human identities becoming breach paths.

What this signals

SGLang is a reminder that AI infrastructure often fails at the boundary where object parsing meets privileged execution. Teams should watch for exposed internal brokers, replay tools, and any service that can transform untrusted bytes into process-level actions before a policy layer has a chance to intervene.

Deserialization blast radius: when parsing logic can execute as the service identity, the unit of risk is no longer the message, but the workload context attached to it. That should push platform teams to inventory every serializer in the inference stack and map it to a trust boundary, just as they would for secrets and token handling.

The operational signal is straightforward: if a runtime can accept a file or socket message and immediately create child processes, write files, or reach out to the network, it is already behaving like an execution surface. Pair that view with NIST Cybersecurity Framework 2.0's identify and protect functions and the exposed-path question becomes easier to answer.

For practitioners

Remove pickle from externally reachable AI paths Replace pickle-based network deserialization with schema-validated formats such as JSON, msgpack, or Protocol Buffers wherever input can cross a trust boundary.
Constrain broker reachability to trusted interfaces Bind internal brokers to localhost or tightly segmented private networks, then enforce firewall rules so only known internal clients can reach the port.
Audit every replay and debugging utility for file trust Treat crash-dump replay scripts and offline loaders as privileged execution paths, and require provenance checks before any .pkl artifact is opened.
Inventory AI serving features that silently widen exposure Map which deployments enable multimodal generation, disaggregation, or other optional modules that create new network listeners or deserialization points.
Monitor for process behaviour that follows deserialization abuse Alert on unexpected child shells, unusual file writes, and outbound connections from inference processes because those are common post-exploitation signals.

Key takeaways

SGLang’s three CVEs show that unsafe deserialization in AI serving software can convert routine inference traffic and replay workflows into code execution.
The practical scale is not theoretical because one exposed broker port or one trusted-looking .pkl file can give an attacker the same outcome inside the workload.
The control that matters most is removing executable deserialization from exposed paths and treating brokers, loaders, and replay tools as privileged NHI surfaces.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Exposed brokers and replay scripts are privileged NHI execution surfaces.
NIST CSF 2.0	PR.AC-3	Network access control is central to preventing broker exposure.
NIST Zero Trust (SP 800-207)		Internal service reachability cannot be trusted as a security signal here.

Inventory AI brokers and loaders as NHIs, then restrict execution paths to trusted inputs only.

Key terms

Unsafe Deserialization: Unsafe deserialization happens when software reconstructs objects from untrusted data in a way that can trigger code execution or state corruption. In practice, the risk is not the file or message itself, but the fact that the parser is allowed to invoke behaviour while rebuilding the object.
Workload Identity: Workload identity is the non-human identity assigned to a service, container, broker, or runtime so it can communicate and access resources. In AI infrastructure, the identity often has enough privilege to make parsing mistakes dangerous because compromised execution can reach models, secrets, and adjacent systems.
Deserialization Blast Radius: Deserialization blast radius is the amount of damage a single parsing step can create when untrusted data is converted into executable behaviour inside a privileged service. For AI workloads, it measures how far an attacker can move once a broker, loader, or replay tool accepts malicious serialized input.

Deepen your knowledge

Unsafe deserialization in AI serving stacks is a core topic in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are mapping workload trust boundaries across inference services and debug utilities, it is worth exploring.

This post draws on content published by Orca Security: SGLang unsafe deserialization vulnerabilities in AI serving frameworks. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org