TL;DR: AI agent skill marketplaces now carry the same supply chain risk pattern as software packages, but static code checks and LLM-based review both miss malicious behavior that only appears at runtime, according to Permiso Security. Runtime detonation is becoming the deciding control because executable trust cannot be inferred from code alone.
At a glance
What this is: This is an analysis of why AI agent skills are emerging as a software supply chain risk and why static scanning misses runtime malicious behaviour.
Why it matters: It matters because IAM, NHI, and agent governance teams must decide how to validate downloaded skills before they inherit existing agent privileges and become a new execution path.
By the numbers:
- One industry audit found over 1,100 malicious skills on a single marketplace.
- 96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.
👉 Read Permiso Security's analysis of runtime sandboxing for AI agent skills
Context
AI agent skills are downloadable capability packages that extend what an agent can do, but they also inherit the agent's existing identity, permissions, and network reach. That makes the skill marketplace a governance problem, not just a packaging problem, because a malicious skill can act with whatever access the agent already holds.
Current scanning methods are not built to answer the question that matters most: what does this skill do when it actually runs? Static analysis and LLM-based review can miss data exfiltration, unauthorized network calls, and credential access that only show up during execution, which is why runtime validation is now an identity and control-plane issue as much as a code review issue.
For security teams running agentic workflows, the practical question is not whether skills can be published safely, but whether installed skills can be trusted at the point they inherit permissions. That concern is atypical only in one sense: the distribution model looks new, but the underlying trust problem is the same software supply chain failure pattern identity teams have seen before, just with broader runtime privilege.
Key questions
Q: How should security teams validate AI agent skills before installation?
A: They should execute each skill in a controlled sandbox with real agent context and inspect actual behaviour, not just source code. The review should cover tool calls, file access, DNS resolution, outbound traffic, and credential access attempts before the skill is allowed to inherit production permissions. Behavioral evidence is the only reliable trust signal for downloaded skills.
Q: Why do AI agent skills create more risk than ordinary software packages?
A: AI agent skills inherit the permissions of the agent that runs them, so a malicious skill can act inside an existing trust boundary without needing to steal credentials first. That makes the effective blast radius depend on the agent's roles, tokens, and service connections, which is why skill governance must be tied to identity scope.
Q: What breaks when AI skills are judged only by static code review?
A: Static review misses behaviours that only appear at runtime, including hidden exfiltration, environment variable access, and unauthorized network calls. A skill can look benign in source and still behave maliciously once executed, so code-only review creates a false sense of security and leaves the real attack path untested.
Q: How can teams reduce the impact of a malicious AI skill?
A: They should limit the permissions an agent can lend to any skill, require sandbox detonation before production use, and keep a record of prior analyses so untrusted skills do not repeatedly reach approval decisions. The goal is to shrink inherited privilege before runtime behaviour can be abused.
How it works in practice
Why static scanning misses malicious AI agent skills
Static scanners inspect code without executing it, so they can only match known signatures or infer intent from text. That works poorly for skills that look harmless in source but trigger malicious behaviour at runtime, such as hidden exfiltration, environment variable harvesting, or unauthorized calls to external services. LLM-based evaluation adds another inference layer, but it still does not prove behaviour. If the security gate never runs the skill, it never observes the thing attackers are trying to hide: execution-time actions that only emerge under a live agent context.
Practical implication: validate any downloaded skill in an execution environment before it reaches production permissions.
What dynamic sandbox detonation changes for agent governance
Dynamic detonation executes the skill in an instrumented sandbox and records actions at both the LLM and operating-system layers. That means security teams can inspect actual tool use, file access attempts, DNS lookups, outbound traffic, and credential access rather than trusting a source-code verdict. In identity terms, this matters because the skill is not a separate principal. It is an execution path that borrows the agent's access, so runtime evidence is the only reliable way to see whether it stays within bounds.
Practical implication: treat runtime behavioural logs as the security evidence for skill approval, not source-code inspection alone.
Why encrypted traffic visibility matters in skill marketplaces
A malicious skill does not need plaintext exfiltration to be dangerous. If it can send stolen data over HTTPS, a sandbox that cannot inspect encrypted outbound traffic still misses the abuse path. SSL interception inside the sandbox closes that visibility gap by making network destinations, payloads, and exfiltration attempts observable. For agent governance, that is critical because the control objective is not merely blocking suspicious code. It is proving that a skill cannot covertly move data or credentials off the approved path once it has execution rights.
Practical implication: require network inspection inside sandboxed skill validation, including encrypted traffic analysis.
NHI Mgmt Group analysis
AI agent skills are becoming a software supply chain, not a plugin ecosystem. The governance mistake is to treat downloadable skills as harmless add-ons when they are actually executable trust extensions that inherit existing agent permissions. Once that inheritance happens, the relevant control question is no longer whether the code looks safe, but whether the execution path can be verified under runtime conditions. Practitioners should therefore manage skills as governed supply-chain artefacts, not as informal configuration files.
Static analysis is a weak proxy for behavioural trust in agentic systems. Code inspection and LLM-based review both infer intent from artifacts, while attackers design malicious skills to reveal themselves only after installation and execution. That gap is a control gap, but it is also a discipline gap: the field still overweights pre-execution review in a context where the real risk is post-install behaviour. The implication is that AI agent governance needs behavioural evidence as a first-class approval criterion.
Identity blast radius expands when a skill inherits the agent's active permissions. The same skill package can be safe in one context and dangerous in another because the effective attack surface is defined by the permissions of the agent that runs it. That means agent identity scope, service connections, and API reach all shape the risk of the skill marketplace. Practitioners should map skill approval to inherited privilege, not just package reputation.
Runtime detonation is the named concept this market needs: the only trust test that observes what a skill actually does. Source review asks whether a skill appears malicious. Runtime detonation asks whether it behaves maliciously when executed with real permissions, real tools, and real network paths. That distinction matters because the trust problem is behavioural, not semantic. Security teams should regard runtime detonation as the control that turns AI skill assessment from guesswork into evidence.
Agent governance now depends on proving that execution can be separated from installation. A skill marketplace becomes materially safer only when teams can inspect behaviour before the skill reaches production access, and when the approval path is detached from the agent's default permissions. This is the same lifecycle lesson identity teams learned with service accounts and secrets, but now applied to agentic software. Practitioners should reframe skills as identities-in-motion, not static artefacts.
From our research:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials, according to AI Agents: The New Attack Surface report.
- 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
- Forward pivot: The governance gap is not just visibility, it is scope control, and OWASP Agentic AI Top 10 maps the control areas teams need to harden next.
What this signals
Runtime evidence will become the approval standard for agentic workflows. Teams that continue to rely on code inspection alone will keep approving skills that only become dangerous once they execute with live permissions. The next governance maturity step is to treat runtime behavioural records as the primary trust artefact, with OWASP Agentic AI Top 10 as a useful reference point for the control landscape.
Identity blast radius is now a procurement question, not just an operations question. If an agent can lend its roles, tokens, and service connections to any downloaded skill, then privilege scope determines how far a malicious package can reach. Security, IAM, and platform teams need to decide whether agent permissions are narrow enough to survive third-party execution paths.
Skill registries should evolve into security control points. A searchable inventory of analysed skills gives teams a repeatable way to avoid re-approving known artefacts and to compare behaviour over time. That turns AI skill adoption from an ad hoc trust decision into a governed intake process tied to execution evidence.
For practitioners
- Detonate skills before allowing production use Run every downloadable skill in an instrumented sandbox with a live agent context, and require a behavioural verdict before any production installation or registry approval.
- Separate skill approval from inherited agent privilege Review which IAM roles, API tokens, and service connections an agent already holds, then block skills from reaching those permissions until their behaviour has been verified.
- Log runtime evidence at the LLM and OS layers Capture tool calls, file access attempts, DNS lookups, outbound requests, and credential access as part of the approval record so reviewers can validate what happened, not what was predicted.
- Inspect encrypted exfiltration paths inside the sandbox Make sure sandbox validation can decrypt outbound HTTPS traffic and surface destinations that a malicious skill may use to move data or credentials off the host.
- Maintain a searchable skill intelligence register Keep a record of previously analysed skills, verdicts, and observed behaviours so repeated submissions do not get treated as fresh trust decisions.
Key takeaways
- AI agent skills are an identity problem as much as a software supply chain problem, because they inherit the permissions already attached to the agent.
- Static code review and LLM-based scanning both miss runtime malicious behaviour, which is where credential theft, exfiltration, and unauthorized calls usually surface.
- Sandbox detonation, encrypted traffic inspection, and behavioural logging are the controls that turn skill approval into evidence-based governance.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | NHI-01 | Agent skill marketplaces create tool-use and privilege abuse risk. |
| OWASP Non-Human Identity Top 10 | NHI-04 | Skills inherit agent credentials and must be governed as non-human identity extensions. |
| NIST CSF 2.0 | PR.AC-4 | Access management must constrain what skills can inherit and use. |
Inventory skills as NHI-adjacent execution paths and validate runtime behaviour before granting access.
Key terms
- AI Agent Skill: A downloadable capability package that extends what an agent can do by adding tools, actions, or workflows. In practice, the skill becomes part of the agent's execution path and inherits the agent's permissions, so its risk is determined by both code behaviour and the identity context it runs in.
- Dynamic Sandbox Detonation: A validation method that runs software inside a controlled environment and records what it actually does. For AI agent skills, it is the behavioural test that observes tool use, network calls, file activity, and credential access rather than inferring safety from source code alone.
- Identity Blast Radius: The total scope of damage an identity can cause if its permissions are abused. For agentic systems, blast radius includes every role, token, API connection, and service integration the agent can lend to a skill, which makes entitlement scope a primary security variable.
- Runtime Behavioural Evidence: Observed security evidence collected while software is executing, such as process activity, network traffic, and file access. In agent governance, this evidence is more trustworthy than static inspection because it proves how a skill behaved under real permissions and real execution conditions.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by Permiso Security: Introducing SandyClaw, the first dynamic sandbox for AI agent skills and prompts. Read the original.
Published by the NHIMG editorial team on 2026-04-02.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org