Proto6 schema injection exposes code execution in protobuf.js

By NHI Mgmt Group Editorial TeamPublished 2026-06-05Domain: Breaches & IncidentsSource: Cyera

TL;DR: Six vulnerabilities in protobuf.js and protobufjs-cli let attacker-controlled schema data trigger remote code execution, process-wide denial of service, or build-time code injection through unsafe type lookups, prototype pollution, and generated JavaScript, according to Cyera; Schema-to-code boundaries now need the same trust controls as executable code, not metadata assumptions.

At a glance

What this is: Cyera’s Proto6 research shows that protobuf.js can turn attacker-controlled schema data into code execution or denial of service when schema values cross unsafe lookup and code-generation boundaries.

Why it matters: For IAM and NHI teams, this is a reminder that machine and agent workflows often inherit risk from build-time and runtime libraries that treat metadata as trusted input.

By the numbers:

protobuf.js has over 48 million weekly npm downloads.

👉 Read Cyera's analysis of Proto6 and protobuf.js schema injection risks

Context

Proto6 is about what happens when schema metadata is allowed to behave like executable input. In protobuf.js, a type name, namespace, or option path can move from a structured schema into runtime JavaScript generation, which breaks the assumption that schema values are inert. That matters anywhere non-human identities, service workflows, or build pipelines consume externally influenced schema data.

The identity governance angle is less about protobuf itself and more about trust boundaries around machine-executed code paths. When libraries compile data into functions, the surrounding programme must treat those inputs as controlled assets, not just configuration. That is the same governance problem that appears in NHI pipelines, autonomous tooling, and developer-controlled CI systems.

Key questions

Q: What breaks when protobuf schema data is allowed to drive code generation?

A: The break point is the trust boundary between metadata and executable code. When schema names, type lookups, or option paths flow into generated JavaScript without strict sanitisation, an attacker can turn a data file into runtime behaviour, causing code execution, denial of service, or build-time compromise.

Q: Why do schema-driven libraries create risk for non-human identity workflows?

A: Because they often run inside privileged service accounts, bots, CI jobs, or message consumers. If a schema library converts untrusted input into code or crashes repeatedly on attacker-shaped payloads, the impact lands inside the same execution context that identity and access teams assume is controlled.

Q: How do security teams know when generated code is crossing a trust boundary?

A: Look for library behaviour that turns names, types, or descriptors into source text, then compiles that text at runtime or writes it into imported files. That is the signal that a schema pipeline is no longer just parsing data. It is creating executable artefacts that need code-level review.

Q: Should organisations treat protobuf libraries as security-sensitive components?

A: Yes, when they compile external or semi-trusted schemas, consume untrusted binary messages, or run in build pipelines. In those cases, protobuf libraries are part of the attack surface, and teams should govern them with the same care they apply to other code-generation or deserialisation boundaries.

Technical breakdown

Prototype pollution turns type lookup into code selection

protobuf.js resolves primitive types through ordinary object property lookups. If an attacker has already polluted Object.prototype, a lookup that should fail can return a crafted value instead of undefined. The library then treats the attacker-controlled string as a valid primitive and interpolates it into generated encoder or decoder code. Because the generated source is later passed to Function(), the lookup bypass becomes a code execution sink. This is not a generic JavaScript weakness in the abstract. It is a specific design pattern where inherited properties and runtime code generation combine to convert metadata into executable behaviour.

Practical implication: isolate type resolution with null-prototype objects and block inherited lookup paths before code generation.

Static schema generation can become build-time code injection

The pbjs CLI takes schema names and emits JavaScript files. In the vulnerable path, names were only checked for reserved words, not for syntax-breaking or statement-inserting characters. That means a crafted namespace or service name could be written directly into emitted source and executed when the generated file is imported. This is a supply-chain problem, not just a parser bug, because the dangerous moment is build or test execution after the file has already been generated. Any pipeline that accepts contributed schemas or generated artefacts inherits this risk.

Practical implication: sanitise every schema-derived identifier before emission and treat generated source as untrusted until reviewed.

Recursive message decoding creates process exhaustion paths

Protocol Buffers support nested and recursive structures, which means a decoder can walk deeply nested messages until the call stack or runtime limits are exhausted. In the PoC, a relatively small binary payload with extreme nesting caused repeated crashes in message-processing workflows, including bot and cloud-function patterns. The core issue is that decoding depth was not constrained tightly enough for attacker-controlled input. When a library assumes schema recursion is bounded by benign data, a malicious sender can turn that assumption into a repeatable denial-of-service condition.

Practical implication: cap decode depth and fail closed on untrusted recursive payloads before they reach application logic.

Threat narrative

Attacker objective: The attacker wants to turn schema handling into code execution, service disruption, or build-system compromise inside trusted JavaScript environments.

Entry begins when an attacker supplies crafted protobuf schema data or a malicious binary payload to a Node.js process that trusts external schema-derived input.
Credential or control access is not the point here; instead, the attacker abuses unsafe type resolution, prototype pollution, or schema name handling to steer generated code paths.
Impact arrives as arbitrary JavaScript execution, repeated process crashes, or build-pipeline code execution when imported output runs inside trusted systems.

Shai Hulud npm malware campaign — Shai Hulud campaign: npm malware exposed secrets on GitHub.
Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Schema-to-code trust is the broken assumption here. protobuf.js was operating as if schema metadata stayed metadata, but the library turns that metadata into executable JavaScript. Once attacker influence reaches type names or namespace names, the boundary is no longer descriptive, it is executable. Practitioners should treat any schema compilation path as a code-generation control point, not a formatting step.

Prototype lookup is not a safe authorisation model for generated code. The vulnerability chain works because ordinary object inheritance can answer a lookup that should have failed. That is a governance failure mode, not just a coding error, because the library assumed trusted resolution semantics inside an untrusted input path. The implication is that object shape and inheritance are part of the attack surface whenever identity-adjacent systems compile data into runtime logic.

Runtime code generation creates identity blast radius when non-human workflows depend on it. Build systems, bots, and message processors often sit behind service accounts or workload identities, so a codegen flaw can inherit privileged execution context immediately. That makes the control failure larger than the library itself. The practitioner problem is not simply patching a package, but recognising where generated code can inherit access that was never meant for the original input.

Recursive decode limits are a governance control, not an implementation detail. The DoS findings show that schema recursion becomes an availability liability when input can be attacker-shaped. This is the same class of failure that appears when service workflows trust protocol structure more than payload reality. Teams should view maximum depth and message size as enforceable policy, because the absence of those limits leaves runtime stability to attacker choice.

Code generation boundaries need their own named concept: schema execution debt. This is the accumulated risk created when libraries convert externally influenced schema data into runnable source without full sanitisation, bounded recursion, and null-prototype lookup discipline. The debt is not theoretical. It surfaces as RCE, repeated crashes, and build-time compromise. Practitioners should classify any schema-to-code pathway as a high-risk trust boundary and govern it accordingly.

From our research:
protobuf.js has over 48 million weekly npm downloads, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.
Cyera’s findings sit in a broader environment where 44% of developers are reported to follow security best practices for secrets management, according to The State of Secrets in AppSec.
For teams reviewing code-generation boundaries, 52 NHI Breaches Analysis remains the relevant forward path for understanding how non-human execution paths fail in practice.

What this signals

Schema execution debt: as more machine workflows compile data into code, teams need a governance lens that treats generated artefacts as high-risk execution surfaces, not passive outputs. This is especially relevant where service identities can trigger imports, retries, or build steps without human review.

The practical signal for IAM and platform teams is that runtime trust and build trust are converging. If a workflow can ingest externally influenced schema data and then execute generated code under a privileged identity, the identity programme has to account for both input control and execution control, not just access assignments.

For practitioners

Inventory schema-to-code paths Map every place protobuf schemas, JSON descriptors, or generated files enter runtime or CI. Flag flows where external contributors, tenants, plugins, or upstream services can influence names, types, or recursion depth.
Block inherited object lookups Use null-prototype objects for lookup tables and reject prototype-linked properties in any schema resolution logic. This removes inherited answers from the resolution path and reduces prototype pollution leverage.
Sanitise schema-derived identifiers before emission Treat every emitted namespace, service, enum, and field name as code input. Enforce strict character allowlists and escape rules before writing generated JavaScript files.
Enforce depth and size limits on untrusted messages Set explicit recursion depth and payload size bounds for all decoders that handle externally supplied protobuf data. Fail closed when nested structures exceed the approved limit, before application logic sees the message.
Separate build trust from source trust Do not import generated protobuf output from unreviewed pull requests or shared schema registries without controls that inspect the generated artefact as executable code.

Key takeaways

Proto6 shows that schema metadata can become executable code when libraries trust type names, descriptors, and generated output too much.
The evidence spans remote code execution, denial of service, and build-time injection, which means the issue is broader than a single bug class.
Teams should govern schema compilation, identifier sanitisation, recursion limits, and generated artefacts as security controls, not developer conveniences.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Schema-derived code paths can expose or reuse non-human credentials and trust boundaries.
NIST CSF 2.0	PR.IP-1	Secure software lifecycle controls apply to generated code and deserialisation boundaries.
NIST Zero Trust (SP 800-207)	PR.AC-4	Least-privilege access should limit what a compromised build or runtime process can execute.

Review NHI code-generation and deserialisation paths under NHI-03 and block untrusted input from executable sinks.

Key terms

Schema-to-code boundary: The point where structured data is converted into executable source or runtime logic. In practice, this boundary is fragile because untrusted names, types, or descriptors can affect what code is emitted or run, so it needs code-level validation, escaping, and review.
Prototype pollution: A JavaScript attack pattern where an attacker adds or changes properties on a shared prototype object. That change can influence later lookups across unrelated objects, which becomes dangerous when security-sensitive code assumes property absence means safety.
Generated artefact: A file or function produced automatically from source data, such as a compiled schema or emitted JavaScript module. Generated artefacts should be treated as security-sensitive outputs because the source input can shape what executes later in trusted environments.
Recursive decode limit: A maximum nesting depth enforced while parsing structured messages. Without that ceiling, attacker-shaped recursive payloads can exhaust stack space or runtime resources, turning a valid format into a repeatable denial-of-service path.

Deepen your knowledge

Schema-to-code trust boundaries and generated artefact governance are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team operates bots, CI jobs, or runtime code generation, this is a practical place to build shared controls.

This post draws on content published by Cyera: Proto6, the schema was not supposed to run. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-05.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org