Exposed Hugging Face tokens show how LLM supply chains fail

By NHI Mgmt Group Editorial TeamPublished 2026-03-16Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: Exposed API credentials can extend from code repositories into model, dataset, and supply-chain compromise, according to Lasso Security, which found 1,681 valid Hugging Face and GitHub tokens, including 655 with write permissions, and mapped access across 723 organisation accounts. Hard-coded tokens turn LLM platforms into identity-risk amplifiers, not just development tools.

At a glance

What this is: This research shows that exposed Hugging Face API tokens can unlock repository, model, and dataset access at scale, turning developer credential leakage into an LLM supply-chain problem.

Why it matters: IAM, NHI, and AI governance teams need to treat model registries and code repositories as identity-controlled surfaces because leaked tokens can drive write access, model tampering, and private data exposure.

By the numbers:

Lasso Security found 1,681 valid tokens exposed through Hugging Face and GitHub.
The research mapped access across 723 organisation accounts.
655 users’ tokens were found to have write permissions.

👉 Read Lasso Security's research on exposed Hugging Face API tokens

Context

Hugging Face tokens are non-human identities in practice because they carry repository and model permissions that outlive the human who created them. When those tokens are hard-coded or leaked into public code, the access path can move from developer convenience to supply-chain compromise very quickly, especially when tokens can read private models or write back into shared registries.

The primary governance failure is not just secret exposure. It is the assumption that model platforms sit outside the IAM and NHI controls used elsewhere in the enterprise. Once API tokens can access models, datasets, and org repositories, the security boundary is no longer the application layer alone; it is identity, entitlement, and lifecycle control across the AI build chain.

Key questions

Q: What breaks when Hugging Face API tokens are exposed in public code?

A: Exposed Hugging Face API tokens turn repository access into a live identity compromise because they can reveal ownership, permissions, and in some cases write access. That means attackers may alter models, access private assets, or steal AI resources through a token that was never meant to be public. Treat the leak as an NHI incident, not a simple code cleanup task.

Q: Why do exposed model registry tokens create supply-chain risk?

A: Because they can change shared artifacts that downstream teams trust. If a token can write to a model or dataset, a single leaked credential can reach many applications through the normal consumption chain. The risk is not only theft of the token itself, but tampering that persists after the original leak is found.

Q: How do security teams know if NHI tokens in AI workflows are actually under control?

A: Look for three signals: every token has a named owner, write scopes are rare and justified, and exposure triggers automated revocation. If you cannot tie a token to a lifecycle owner or prove it is short-lived, the control is not working. The registry may look tidy while the real access path remains uncontrolled.

Q: Should organisations treat model registries differently from other code platforms?

A: Yes, because model registries can carry both identity privileges and supply-chain impact at the same time. A leaked token may not just expose a repository; it can change the artifact that hundreds of downstream users trust. That means model registries need IAM, NHI, and software supply-chain controls together, not in separate silos.

Technical breakdown

Hugging Face API tokens as non-human identities

Hugging Face API tokens behave like non-human identities because they represent delegated access to repositories, models, and datasets. In this research, the token validity check exposed the owning user, organisational memberships, and permissions, which is exactly why leaked tokens become governance objects rather than simple secrets. The important point is that the token is the control plane for the resource, not just a login credential. If the token has write scope, it can alter downstream artifacts that other teams trust as inputs to development and deployment.

Practical implication: Classify model registry tokens as governed NHI credentials and review their scope, ownership, and revocation process.

Why exposed tokens become supply-chain risk

A leaked token becomes a supply-chain issue when it can modify shared models or datasets that many downstream consumers import automatically. In the article, valid tokens enabled access to private models, repository creation, and write operations on datasets with significant download volume. That combination matters because model integrity is part of the software supply chain, especially when a trusted artifact is redistributed through applications, notebooks, or internal pipelines. The technical risk is not only theft but tampering that can persist after the original exposure is discovered.

Practical implication: Map every model and dataset write path to the identities that can alter it, then restrict those identities to the narrowest possible scope.

Organisation-level token exposure and privilege drift

Organisation API tokens are especially risky because they often aggregate access across multiple repositories and teams. The research showed that exposed tokens could reveal the user, the organisations they belonged to, and the permissions attached to those memberships, which makes privilege drift visible at the identity layer. Once that token is reused or copied into automation, access can outlive the original developer context and become difficult to contain. This is why token exposure must be handled as an identity lifecycle problem, not just as code hygiene.

Practical implication: Tie token issuance to ownership, enforce short lifetimes, and require automated revocation when a token is exposed or no longer needed.

Threat narrative

Attacker objective: The attacker’s objective is to use leaked non-human identity tokens to alter, steal, or redistribute trusted AI assets at scale.

Entry occurred when publicly exposed Hugging Face and GitHub API tokens were discovered in repositories and search results, giving attackers a direct identity foothold into model infrastructure.
Credential access followed through valid token use, allowing read access to private models and organisational metadata, and in some cases write permissions on models and datasets.
Impact would be model tampering, dataset poisoning, and private resource theft across a large number of accounts and downstream consumers who trust those artifacts.

LiteLLM PyPI package breach — LiteLLM PyPI supply chain attack, credentials stolen from users.
Shai Hulud npm malware campaign — Shai Hulud campaign: npm malware exposed secrets on GitHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Hard-coded AI platform tokens are now identity assets, not just secrets: The article shows that a leaked Hugging Face token can expose ownership, membership, and permission data, which means the token is carrying a live delegation relationship. That makes it an NHI governance object with lifecycle, scope, and revocation requirements. Practitioners should treat model registry tokens as controlled identities that need assignment and offboarding discipline.

Identity blast radius is the right lens for model and dataset exposure: The real problem is not a single leaked credential, but the downstream reach of that credential into multiple repositories, datasets, and consumer pipelines. Once a write-scoped token can touch shared artifacts, the blast radius is determined by who trusts the artifact, not just who created it. Practitioners should map blast radius before they expand model access.

Supply-chain compromise now starts at the identity layer: This research connects token leakage to model manipulation, dataset poisoning, and resource theft, which means the first exploitable layer is often identity rather than code. That aligns with OWASP NHI thinking because over-permissioned or exposed machine credentials become the simplest route into trusted AI infrastructure. Practitioners should fold model registries into the same governance model they use for other NHI-backed supply chains.

Model theft should be reframed as AI resource theft: The article’s own finding that thousands of private models and datasets could be accessed supports a broader concept than model theft alone. Models, datasets, and the tokens that govern them form one access domain, so the failure mode is theft or manipulation of AI resources. Practitioners should govern the resource set, not one artifact type in isolation.

Secrets management gaps in AI pipelines expose a governance assumption that no longer holds: The assumption that developer tokens remain private long enough to be reviewed was designed for slower, manually governed release cycles. That assumption fails when tokens are copied into public code, searched at scale, and validated against live APIs in minutes. The implication is that AI build pipelines need identity controls that operate at machine speed, not just periodic review.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
From our research: Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control, according to The State of Secrets in AppSec.
From our research: For teams dealing with model registry exposure, the next question is how to govern exposed NHI credentials before they become downstream supply-chain events, as explored in Top 10 NHI Issues.

What this signals

Ephemeral exposure is now the default failure mode for AI credentials: the gap is not whether a token can be found, but whether it can be neutralised before it is reused. With 43% of security professionals already concerned that AI systems can learn and reproduce sensitive patterns from codebases, the governance task now spans both secrets hygiene and AI-specific data leakage control.

Security teams should assume that model registries sit inside the same blast radius as source control, CI/CD, and package ecosystems. The practical shift is to connect token scanning, artifact integrity, and access ownership into one control path, using the same discipline applied to the 52 NHI breaches Report where identity failures created lasting downstream exposure.

A useful named concept here is AI resource theft: the failure mode where attackers use leaked tokens to access, copy, or alter models and datasets as a single resource set. That framing helps practitioners stop treating model theft as a narrow asset issue and start governing the identity that controls the whole training and distribution surface.

For practitioners

Inventory Hugging Face and GitHub tokens as governed NHI credentials Create a register of model registry tokens, classify them by scope and owner, and assign a revocation path for each token type. Treat write-scoped tokens as higher-risk identities and review them on the same cadence as other privileged NHI accounts.
Restrict write permissions on shared models and datasets Separate read-only consumption from repository and dataset modification rights, then limit write access to named maintainers. Where possible, use dedicated service identities for automation instead of reusing developer tokens across multiple workflows.
Automate exposure detection and revocation Scan public repositories and internal code reviews for token patterns, then revoke exposed credentials immediately and notify owners. Pair that with short token lifetimes so exposed credentials cannot remain valid long enough to be reused.
Bring model registries into supply-chain control mapping Map each model, dataset, and automation path to the identity that can alter it, then document the downstream consumers that trust those artifacts. Use this map to decide where additional approval, attestation, or segregation is required before promotion.

Key takeaways

Leaked Hugging Face tokens are non-human identity failures because they can expose permissions, enable write access, and extend into trusted AI supply chains.
The scale matters: 1,681 valid tokens and 723 organisation accounts show that exposed credentials can turn into broad model and dataset risk, not isolated misuse.
Practitioners should govern model registry tokens with lifecycle, scope, and revocation controls before exposed identities become a supply-chain compromise.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Token exposure and rotation gaps are central to this research.
NIST CSF 2.0	PR.AC-4	Repository and dataset access needs least-privilege control.
NIST Zero Trust (SP 800-207)	PR.AC-1	Zero trust helps constrain trust in leaked identity artifacts.

Map model and dataset permissions to least privilege and review write access regularly.

Key terms

Hugging Face API token: A Hugging Face API token is a credential that grants programmatic access to models, datasets, and repository functions. In practice it behaves like a non-human identity because it carries scope, ownership, and lifecycle requirements. When exposed, it can be used to read, write, or manipulate trusted AI assets.
Model registry: A model registry is the system that stores, versions, and distributes machine learning models and related artifacts. It becomes an identity-governed surface when tokens or service accounts can modify its contents, because compromise there can affect everything downstream that consumes those models.
Supply-chain compromise: Supply-chain compromise occurs when an attacker alters a trusted component so that downstream users inherit the damage. In AI environments, the component may be a model, dataset, or token-bearing identity, which means the security problem spans both artifact integrity and credential control.
AI resource theft: AI resource theft is the unauthorised copying, access, or alteration of models, datasets, and the identities that control them. It is a broader failure mode than model theft alone because it treats the access domain as one governed resource set rather than isolated files or repositories.

Deepen your knowledge

AI credential exposure and model registry governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is trying to secure LLM pipelines and token-bearing identities, it is worth exploring.

This post draws on content published by Lasso Security: 1500+ HuggingFace API Tokens were exposed, leaving millions of Meta-Llama, Bloom, and Pythia users vulnerable. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-16.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org