TL;DR: Three unsafe deserialization flaws in SGLang, including two unauthenticated remote code execution paths that trigger when exposed multimodal or disaggregation features accept network input, plus a third crash-dump replay issue tied to malicious .pkl files, were found by Orca Security. The broader lesson is that AI serving frameworks still treat untrusted bytes as trusted control flow, which makes runtime trust boundaries the real security control.
NHIMG editorial — based on content published by Orca Security: SGLang unsafe deserialization vulnerabilities in AI serving frameworks
Questions worth separating out
Q: What breaks when AI serving frameworks deserialize untrusted network data?
A: The trust boundary collapses before the application can validate the request.
Q: Why do AI serving brokers create hidden NHI risk in Kubernetes and cloud environments?
A: Because they are privileged non-human identities that often listen on internal ports and handle sensitive model traffic.
Q: How do security teams know whether an inference stack is exposed to deserialization abuse?
A: Check for any path that accepts external or semi-trusted .pkl, pickle, or equivalent serialized objects, especially in network brokers, replay scripts, job queues, and admin utilities.
Practitioner guidance
- Remove pickle from externally reachable AI paths Replace pickle-based network deserialization with schema-validated formats such as JSON, msgpack, or Protocol Buffers wherever input can cross a trust boundary.
- Constrain broker reachability to trusted interfaces Bind internal brokers to localhost or tightly segmented private networks, then enforce firewall rules so only known internal clients can reach the port.
- Audit every replay and debugging utility for file trust Treat crash-dump replay scripts and offline loaders as privileged execution paths, and require provenance checks before any .pkl artifact is opened.
What's in the full article
Orca Security's full research covers the operational detail this post intentionally leaves for the source:
- Line-by-line attack flow for CVE-2026-3059 and CVE-2026-3060, including the exact broker and receiver code paths
- Proposed patch analysis showing the localhost-binding and msgpack migration approach in more depth
- Detection guidance with process- and network-level indicators that help distinguish exploitation from normal inference traffic
- Disclosure timeline, affected versions, and the unmerged patch status for teams tracking remediation exposure
👉 Read Orca Security's analysis of SGLang pickle-based RCE paths and AI workload exposure →
SGLang deserialization flaws: what IAM and AI teams need to know?
Explore further
Unsafe deserialization in AI serving is an NHI governance failure, not just a code defect. The broker, encoder receiver, and replay utility are all non-human identities making trust decisions on behalf of production systems. Once those identities accept untrusted input as executable state, the governance problem becomes one of workload trust, not only application hardening. The practitioner conclusion is that AI serving paths must be governed as privileged NHI execution surfaces.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control, according to The State of Secrets in AppSec.
A question worth separating out:
Q: Which frameworks are most relevant when governing unsafe deserialization in AI workloads?
A: OWASP NHI guidance applies because the vulnerable component is a non-human identity with privileged execution. NIST Cybersecurity Framework 2.0 is relevant for asset visibility, protection, detection, and response, while zero trust principles apply to internal service reachability. If the stack uses agentic components, OWASP Agentic AI guidance can help model tool and execution boundaries, but the core issue here is workload trust.
👉 Read our full editorial: SGLang pickle RCE shows how AI serving stacks fail