AI agent skill marketplaces expose a new supply chain risk

By NHI Mgmt Group Editorial TeamPublished 2026-05-05Domain: Agentic AI & NHIsSource: Orca Security

TL;DR: AI agent skill marketplaces can be weaponized through spoofed popularity signals, non-continuous scanning, silent overrides, and blind bulk updates, allowing malicious skills to reach users with persistent code execution, according to Orca Security. Treating skills as untrusted code is now a supply chain identity problem, not just a developer convenience issue.

At a glance

What this is: This is an analysis of how AI agent skill marketplaces can be abused as a supply chain vector, with trust signals and update mechanics enabling malicious skill distribution.

Why it matters: It matters because IAM, PAM, and NHI teams need to understand how agent tooling, repository trust, and lifecycle controls intersect when skills can execute code with delegated access.

By the numbers:

Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.

👉 Read Orca Security's analysis of malicious AI agent skills and supply chain abuse

Context

AI agent skills are reusable prompt-based extensions that can alter how an agent behaves, what commands it runs, and how it interacts with local or remote resources. In this case, the governance gap is that a marketplace can make untrusted skills look trusted through weak metadata, delayed scanning, and update behaviour that hides change from the user.

For IAM and NHI teams, the issue is not the skill format alone. It is the combination of delegated execution, repository trust, and lifecycle blind spots that lets a malicious skill survive initial review, spread through popularity signals, and execute after installation or update.

The first paragraph is about a marketplace security problem, but the practitioner lesson is broader: once an agent can execute code through installed extensions, controls for provenance, review, and update fidelity become identity controls as much as software controls.

Key questions

Q: What breaks when AI agent skills are not reviewed before installation?

A: The main failure is that malicious instructions can hide inside a skill that appears legitimate, then execute under the agent’s delegated access. Without manual review of the repository and skill content, users rely on marketplace reputation signals that can be manipulated. That turns installation into a trust shortcut and makes code execution a supply chain problem.

Q: Why do AI agent skill marketplaces create a governance risk for IAM teams?

A: They create risk because they distribute executable behaviour through a trust layer that looks like content, not identity. Once a skill can change commands, access local files, or trigger nested installation, IAM teams need visibility into provenance, approval, and update handling. The governance issue is delegated authority, not just software distribution.

Q: How do security teams know whether agent skills are actually under control?

A: They should look for version pinning, per-skill ownership, change diffs before update, and collision warnings when a name is reused. If any of those are missing, the environment does not have reliable control over what the agent can execute. A clean install count or audit badge is not enough to prove control.

Q: What should organisations do when an agent skill can silently replace another skill?

A: They should treat silent replacement as a control failure and block it at the policy layer. A skill name should not be enough to override an existing trusted skill without an explicit prompt, provenance check, and review of the source repository. Otherwise, the environment cannot distinguish maintenance from substitution.

Technical breakdown

How AI agent skill trust signals are abused

Agent skills in the source ecosystem are markdown files that can contain instructions and embedded code blocks. The marketplace then exposes trust cues such as install counts and security audit results, but those cues can be detached from current reality when telemetry is unauthenticated or scans are not continuous. That creates a mismatch between what the user sees and what the agent actually executes. In practice, this turns marketplace metadata into an attack surface. If a skill can appear popular, clean, and unchanged while its repository content has already been altered, the trust model is no longer anchored to the code the agent will run.

Practical implication: verify the repository content and provenance of each skill instead of relying on marketplace reputation signals alone.

Why non-deterministic scanning creates a malicious skill window

The article describes scanning that happens at creation and again only when popularity thresholds are reached. That is not continuous assurance. It means a benign skill can pass an initial audit, then be modified later to include malicious instructions while the marketplace still displays the earlier clean result. This is a classic time-of-check to time-of-use problem, but applied to agent supply chains. For practitioners, the failure is not just weak detection. It is the absence of a control that tracks repository state continuously enough to detect post-approval mutation before the next install or update cycle.

Practical implication: treat post-scan repository changes as a separate review event rather than assuming a prior audit still applies.

What silent override and blind bulk updates mean for agent governance

Silent override means a skill with the same name can replace an installed one without warning, and blind bulk updates mean every installed skill refreshes together with no per-skill review. Together, they erase user visibility into what changed and when. That matters because the agent is not just loading content, it is inheriting executable behaviour from the extension layer. Once update and name resolution operate without granular confirmation, the user cannot reliably distinguish legitimate maintenance from hidden substitution. The platform has therefore weakened the integrity of the skill lifecycle itself, not only its scanning process.

Practical implication: require per-skill update review and collision warnings before any new skill or revision can take effect.

Threat narrative

Attacker objective: The attacker’s objective is to achieve persistent code execution through trusted-looking agent skills and extend that foothold across user environments.

Entry begins when an attacker publishes a benign-looking agent skill, then inflates its install count so users perceive it as trustworthy and adopt it.
Credential access and abuse occur when the installed skill carries embedded commands or nested malicious instructions that the agent executes under delegated local access.
Impact follows when the malicious skill persists across updates or silently overrides an existing skill, enabling repeated code execution on end-user systems.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
LiteLLM PyPI package breach — LiteLLM PyPI supply chain attack, credentials stolen from users.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI agent skill marketplaces are becoming identity distribution layers, not just extension stores. Once a skill can change agent behaviour, invoke commands, and persist through updates, the marketplace is part of the identity trust chain. That shifts the problem from code quality to delegated execution integrity, where provenance and lifecycle control matter as much as scanning. Practitioners should treat skill installation as a privileged trust event, not a routine add-on.

Install count inflation is a trust-signalling failure, not a popularity feature. When unauthenticated telemetry can manufacture adoption signals, the marketplace converts social proof into an attack primitive. That undermines every control that assumes the user can judge trust from visible demand indicators. The field should stop treating install counts as a benign engagement metric and start treating them as a manipulable governance signal.

Continuous review, not one-time scanning, is the baseline missing here. The article shows how a skill can pass an initial audit and later become malicious while the clean verdict still displays. That is a lifecycle control gap, because the approved state no longer matches the executable state. Security teams should recognise this as a persistence problem in the supply chain governance model.

Blind update semantics create hidden privilege drift for agent behaviour. A skill that was benign at install time can become a different trust object at update time when all skills refresh together with no granular diff. This is the same governance failure pattern seen in other NHI lifecycle problems: the control assumes change will be visible and reviewable, but the system removes that visibility. Practitioners should separate approval from execution in the skill lifecycle.

Identity blast radius expands when agents inherit behaviour from untrusted extensions. The more an agent can act on local systems, the more a compromised skill becomes a delegated identity problem, not a narrow malware event. That makes provenance, version pinning, and collision handling part of the identity perimeter for AI coding environments. Teams should reassess where delegated authority begins and ends in agent toolchains.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
Security teams should widen their review lens to The 52 NHI breaches Report to see how trust failures and lifecycle gaps repeatedly turn into compromise.

What this signals

Identity blast radius is now shaped by extension trust, not just native platform access. When an AI agent consumes skills that can execute commands, the security question shifts to how much delegated authority each installed extension inherits. That makes provenance, collision handling, and update review part of the operating model for agent governance, not a niche software concern.

With 43% of security professionals concerned about AI systems learning and reproducing sensitive information patterns from codebases, the boundary between useful automation and credential exposure is already under pressure. Teams that run coding agents should align extension controls with the same discipline they apply to secrets handling and workload identity, using the OWASP NHI Top 10 as a threat-model anchor.

The operational shift is straightforward: treat skill marketplaces like high-trust software distribution channels and validate them with lifecycle review, not consumer-style reputation metrics. If your programme cannot answer who approved a skill, what changed in the repository, and which version is live, your control environment is already behind the threat.

For practitioners

Inventory installed agent skills as first-class identities Track every skill, its source repository, version, install date, and the commands or workflows it can trigger. Require ownership for each skill so there is an accountable reviewer before installation or update.
Require explicit diff review before updates Block bulk refreshes that change all skills at once. Review per-skill diffs, pin to known commits where possible, and prevent any skill from updating unless the change set is visible and approved.
Treat name collisions as hostile until verified If a newly installed skill uses the same name as a trusted one, force a warning and manual verification before replacement occurs. Collision handling should fail closed, not silently overwrite the existing skill.
Harden trust signals at the marketplace boundary Do not let popularity metrics or audit badges stand in for provenance checks. Authenticate telemetry, rate-limit count updates, and require repository state to match the last approved scan before installation.

Key takeaways

AI agent skill marketplaces can turn trust signals, scan cadence, and update design into direct attack primitives.
The proof-of-concept attacks in the article show that malicious skills can achieve real code execution, not just theoretical exposure.
Practitioners need per-skill provenance, reviewable updates, and collision controls before agent skills become a stable part of production workflows.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agent skills alter runtime behaviour and can execute untrusted instructions.
OWASP Non-Human Identity Top 10	NHI-01	Skill repositories and update channels behave like non-human identity trust surfaces.
NIST CSF 2.0	PR.AC-4	Delegated skill access must be limited and reviewed as privileged access.

Review agent tool and skill trust boundaries before allowing delegated execution.

Key terms

Agent Skill: An agent skill is a reusable extension that changes how an AI agent behaves, what commands it can run, or which workflows it follows. In security terms, it acts like delegated executable behaviour, so provenance, versioning, and update control matter as much as the skill’s content.
Silent Skill Override: Silent skill override is a failure mode where a newly installed skill with the same name replaces an existing skill without warning. The user loses visibility into substitution, which means a trusted skill can be displaced by a different source while the interface still looks normal.
Install Count Inflation: Install count inflation is the manipulation of marketplace popularity metrics so a malicious skill appears trusted or widely adopted. When telemetry is unauthenticated, the metric becomes a social engineering input rather than evidence of safe use, which weakens user judgement and marketplace governance.
Blind Bulk Update: Blind bulk update is a lifecycle design where all installed skills refresh together with no per-skill diff or approval step. That creates hidden change exposure because a benign skill can become malicious after the update, and the user has no granular control over what changed.

Deepen your knowledge

AI agent supply chain security and delegated extension risk are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is governing coding agents or marketplace-installed skills, the course provides a practical baseline for building that control model.

This post draws on content published by Orca Security: LLMjacking and malicious AI agent skill supply chain risks. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-05.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org