By NHI Mgmt Group Editorial TeamPublished 2026-05-18Domain: Breaches & IncidentsSource: HiddenLayer

TL;DR: ChromaDB’s Python FastAPI server can execute attacker-controlled model code before authentication, turning a collection-creation request into pre-auth remote code execution and full process compromise, according to HiddenLayer research; the flaw affects version 1.0.0 through 1.5.8 and 73% of exposed instances were in that range. The deeper issue is not a single bug but a trust model that lets client-supplied model configuration run ahead of access control.


At a glance

What this is: HiddenLayer found that ChromaDB’s Python FastAPI server can load attacker-controlled embedding code before auth checks, enabling pre-auth RCE.

Why it matters: For IAM, NHI, and platform teams, this shows how access control loses meaning when runtime code execution happens before identity checks and trust boundaries.

By the numbers:

👉 Read HiddenLayer's analysis of ChromaDB pre-auth RCE and model-loading trust


Context

ChromaDB is an open-source vector database used to support semantic retrieval in AI applications, but this research focuses on a much narrower problem: the Python FastAPI server can execute client-influenced model code before it checks whether the caller is authenticated. That means a request that looks like a rejected API call can still become a code execution path.

For identity and access teams, the issue is not limited to vector databases. It is a broader trust boundary failure in systems that fetch and execute remote code, because the security model assumes authorization happens before dangerous runtime behaviour. In practice, any platform that loads externally controlled model artefacts needs to treat that loading step as part of the attack surface, not a neutral setup action.

The article’s starting point is typical of modern AI infrastructure: highly flexible, API-driven, and designed to accept user-selected models and configuration. What is atypical here is the order of operations, where the server acts on untrusted input before identity verification can stop it.


Key questions

Q: What breaks when an AI service loads model code before authentication?

A: Authentication stops being the first security decision, so an unauthenticated request can still trigger code execution and create a full process compromise. The failure is not only access control, but sequencing: if runtime side effects happen first, identity checks cannot prevent them. Teams should treat that as a pre-auth execution window and redesign the request path accordingly.

Q: Why do AI application servers need stricter trust controls than ordinary APIs?

A: Because they often fetch and execute external artefacts, not just process data. That means a seemingly ordinary configuration field can become an execution directive if the runtime loads remote code or plugin logic. Security teams should classify those artefacts as executable inputs and place allowlists, validation, and isolation around them.

Q: How can security teams reduce the blast radius of a compromised AI runtime?

A: Scope the service account, filesystem, and network so the process can only reach the data and secrets it genuinely needs. If an attacker gets code execution, the damage should be limited to one workload rather than environment-wide credentials or mounted secrets. That is especially important for internet-exposed AI services.

Q: Who is accountable when an authenticated route still allows pre-auth compromise?

A: The accountable teams are the application owners, platform engineers, and security architects who define the request lifecycle and trust boundaries. A route label alone does not create protection if the implementation performs risky work before identity enforcement. Governance should review execution order, not just whether an endpoint is marked authenticated.


Technical breakdown

Why pre-auth model loading becomes a code execution path

The core flaw is that ChromaDB accepts embedding function settings from the request body and uses them to load a model before the authentication gate fires. One of those settings, trust_remote_code, tells HuggingFace to execute repository-shipped Python code when a model is loaded. If the attacker controls the model reference, they control what code runs in the server process. This is not an ordinary authorization bug. It is a runtime trust failure in which model selection becomes code execution.

Practical implication: treat model loading as a privileged execution step and validate or block remote-code model references before any request processing.

Why the auth check arrives too late

The ordering defect matters as much as the model-loading behaviour. The authenticated endpoint instantiates the embedding function during request handling, then checks authentication after the model has already been fetched and executed. That means the server can return a failed request while the attacker already has a shell. When an application executes side effects before authorization, the auth layer no longer protects the resource it was meant to govern.

Practical implication: move authentication and request filtering ahead of any configuration parsing that can trigger network fetches or code execution.

What the exposed process can reach after compromise

Once the attacker gets code execution in the ChromaDB process, the blast radius follows the privileges of that process. HiddenLayer notes that the attacker can reach environment variables, API keys, mounted secrets, and on-disk data. That is the usual pattern for application-process compromise: the exploit is delivered through one feature, but the impact expands into every secret and dataset attached to the runtime. In AI platforms, that often includes adjacent model credentials and orchestration tokens.

Practical implication: scope the ChromaDB service account and mounted secrets as if the application itself can be turned into an execution foothold.


Threat narrative

Attacker objective: The attacker aims to obtain remote code execution in the ChromaDB server process and use that foothold to access secrets and data reachable by the application.

  1. Entry occurs when an unauthenticated attacker sends a collection-creation request to the network-reachable ChromaDB API with a malicious HuggingFace model reference.
  2. Credential access is not the initial objective here because the payload executes before authentication, giving the attacker code execution directly in the server process.
  3. Impact follows as the compromised process exposes environment variables, API keys, mounted secrets, and stored data to the attacker.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

Pre-auth execution is a trust boundary failure, not just an auth bug. ChromaDB does not merely check identity too late. It executes attacker-influenced model code before identity can matter, which collapses the assumption that authentication is the first meaningful security gate. That is a broader platform design problem because any runtime that fetches and executes remote artefacts can turn input validation into a code path. Practitioners should treat this as a trust-boundary defect, not an isolated patch ticket.

Client-controlled model configuration should be treated as executable policy, not metadata. The article shows that a model reference and a boolean flag can determine whether the server runs attacker-supplied code. That changes the governance question from 'who may call the API' to 'what execution authority is embedded in the API itself'. The implication is that AI infrastructure teams need to classify configuration fields by execution effect, not by whether they look like ordinary parameters.

Runtime trust in public model registries creates an identity problem for machine execution paths. When a server loads code from a registry on behalf of a caller, the registry becomes part of the authorization chain. That means the security model is no longer limited to human or service identity at the API boundary; it also depends on the trustworthiness of the model supply path. Practitioners should recognise that registry trust is now identity governance for code.

Identity blast radius matters more than endpoint labels when the application process is the target. ChromaDB’s API docs mark the route as authenticated, but the exploit still succeeds because the process acts on the request before the gate closes. The named concept here is pre-auth execution window: the period in which untrusted input can trigger privileged runtime behaviour before identity enforcement. Practitioners should assume that any such window can turn one request into full process compromise.

Fixing the auth order would close one path, but the governance lesson is larger. Systems that accept externally supplied model code inherit the trust assumptions of the upstream registry and of the application runtime at once. That makes AI platform governance a two-layer problem: identity controls at the API are necessary, but they are not sufficient when the runtime itself can be instructed to execute remote code. Practitioners should align platform trust reviews with code-loading behaviour, not just with endpoint authentication.

From our research:

  • Of internet-exposed ChromaDB instances we discovered via Shodan, 73% are running version 1.0.0 or later, the version range in which the vulnerable feature exists, according to The State of Secrets Sprawl 2025.
  • From our research: 4.6% of all public GitHub repositories contain at least one hardcoded secret, according to The State of Secrets Sprawl 2025.
  • The right next step is to pair exposure reduction with runtime governance, not to assume authentication labels alone provide safety.

What this signals

Pre-auth execution window: This class of issue means AI platform teams should review every request path where untrusted input can trigger fetch, import, or deserialisation before identity enforcement. When that sequence exists, the problem is architectural, not just operational, and it belongs in platform threat modelling, secrets reviews, and application security sign-off.

With 4.6% of all public GitHub repositories containing at least one hardcoded secret, per The State of Secrets Sprawl 2025, the practical lesson is that runtime compromise and secret exposure are already tightly linked in real environments. Teams should assume exposed code paths and exposed secrets will eventually meet unless they are separated by design.

This should also push practitioners back to foundational identity controls for machine workloads, especially the boundaries between workload identity, secret access, and application runtime permissions. The more an AI service can import, fetch, or execute on behalf of the caller, the more its trust model needs to resemble high-risk infrastructure rather than a routine web API.


For practitioners

  • Block remote-code model loading by default Disable trust_remote_code-style behaviour in production paths unless there is a separately reviewed allowlist for the exact model artefact and code path. Treat any client-supplied model reference as untrusted until it has passed pre-runtime validation.
  • Move authentication ahead of configuration loading Reorder request handling so identity checks and request rejection happen before any model fetch, deserialisation, or containerised code execution. A route label that says authenticated is not a control if side effects occur first.
  • Constrain the service process blast radius Run the service with the minimum possible filesystem, network, and secret access so a process compromise cannot expose broad environment variables, mounted credentials, or adjacent datasets.
  • Scan model artefacts before runtime loads them Inspect downloaded model files for suspicious module contents, unexpected execution hooks, and registry-hosted code paths before the application can import them into memory.
  • Prefer isolated deployment modes for exposed AI services Where available, use a deployment path that removes the risky code-loading behaviour from the internet-facing request path and place network controls in front of any remaining Python service.

Key takeaways

  • ChromaDB’s flaw shows that authentication is ineffective if the application performs dangerous work before the auth check.
  • The exploit can turn one unauthenticated request into full process compromise, exposing environment variables, API keys, mounted secrets, and stored data.
  • The limiting control is not just patching the endpoint, but reordering execution, constraining runtime trust, and reducing the service blast radius.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A1Remote code loaded through model configuration maps to agentic code-execution risk.
OWASP Non-Human Identity Top 10NHI-05The exploit expands into secrets and workload credentials held by the service process.
NIST Zero Trust (SP 800-207)PR.AC-3Authentication must precede privileged runtime actions in the request path.
NIST CSF 2.0PR.AC-4Access enforcement failed because the system acted before validating caller identity.

Review application execution order and ensure access control gates risky actions before they occur.


Key terms

  • Pre-auth execution window: A pre-auth execution window is the period in which an application can perform privileged or code-executing work before identity is verified. In this case, the weakness is not lack of authentication but the fact that dangerous runtime actions begin too early, allowing compromise even when the request is later rejected.
  • Remote code loading: Remote code loading is the practice of fetching and executing code from an external source at runtime. In AI systems, it often appears as model or plugin retrieval, but the security implication is the same as any dynamic import: the runtime inherits the trustworthiness and integrity of the source it loads.
  • Process blast radius: Process blast radius is the set of credentials, data, and system resources an attacker can reach after compromising one application process. For AI services, it usually includes environment variables, mounted secrets, network reach, and stored datasets, so least privilege and isolation matter as much as patching the flaw.

What's in the full article

HiddenLayer's full research covers the operational detail this post intentionally leaves for the source:

  • The exact request flow that lets an unauthenticated caller reach model loading before the auth check.
  • The vulnerable endpoint pattern across the Python FastAPI server and the version range affected.
  • The mitigation trade-offs between the Rust-based deployment path and the Python server.
  • The disclosure timeline and code-level remediation guidance discussed by the researcher.

👉 HiddenLayer's full post covers the exploit flow, affected versions, and mitigation options in detail.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-18.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org