How should security teams stop AI agents from installing malicious skills?

Put a mandatory inspection and block step at the install boundary, not inside the skill itself. The control must evaluate package metadata, embedded scripts, and suspicious behaviour before execution begins. If the model can skip the check, the check is advisory, not enforcement. Treat agent-installed code as a governed identity action, with ownership and approval rules attached.

Why This Matters for Security Teams

Malicious skill installation is not just a software supply chain problem. For AI agents, it is an identity and authorisation problem because the agent can decide what to install, when to install it, and which tools to invoke next. If the install step is not enforced before execution, an attacker only needs one successful prompt, one poisoned repository, or one compromised dependency to turn the agent into an execution path. Guidance from the OWASP Agentic AI Top 10 and NHI research from OWASP NHI Top 10 both point to the same operational reality: autonomous workloads need runtime controls, not trust in the model’s judgement.

The risk grows because agents can chain actions across packages, scripts, API calls, and secrets in ways that static reviews do not predict. That means malicious skills can arrive through metadata abuse, hidden bootstrap code, or social engineering aimed at the agent itself. In practice, many security teams encounter agent-driven compromise only after the first bad skill has already executed and established persistence.

How It Works in Practice

The practical control is a mandatory install boundary that evaluates the requested skill before any code, plugin, or workflow is activated. Security teams should treat the install request as a governed identity action: who approved it, which agent requested it, what task context justified it, and whether the package is allowed to run in that environment. This maps well to the runtime, context-aware direction described in the NIST AI Risk Management Framework and the agent-focused recommendations in CSA MAESTRO agentic AI threat modeling framework.

Effective inspection usually combines several checks:

Package metadata review for source trust, maintainer anomalies, version drift, and namespace squatting.
Static analysis of embedded scripts, post-install hooks, and bootstrap files before execution begins.
Behavioural policy checks for high-risk actions such as file writes, network egress, credential access, or tool chaining.
Just-in-time approval for sensitive installs, with short-lived permissions that expire after the task completes.
Workload identity and policy enforcement at the agent boundary so the decision is tied to the agent instance, not just a user session.

Current guidance suggests that the strongest pattern is to keep the policy outside the skill itself, ideally in a central policy engine that can evaluate the request in real time. That aligns with the operational lessons in Analysis of Claude Code Security and the credential abuse patterns highlighted in AI LLM hijack breach. These controls tend to break down when the agent can self-update, pull from unvetted registries, or execute arbitrary post-install scripts because the malicious payload can run before any inspection result is enforced.

Common Variations and Edge Cases

Tighter install gating often increases developer friction and can slow legitimate agent workflows, so organisations need to balance safety against task latency and operational exception handling. There is no universal standard for this yet, but the best practice is evolving toward per-task authorisation, short TTL credentials, and explicit ownership for every install decision.

One common edge case is delegated autonomy. If an upstream orchestrator approves a skill once and downstream agents can reuse it indefinitely, the approval model becomes stale almost immediately. Another is third-party skill ecosystems, where package integrity is necessary but not sufficient because a signed package can still behave maliciously after installation. The The State of Non-Human Identity Security research shows how often organisations struggle with visibility and control across non-human access paths, which is why agent installs should be logged as identity events, not treated as ordinary software updates.

For high-risk environments, security teams should prefer deny-by-default policies, sandboxed execution, and separate approval paths for skills that can read secrets, move laterally, or call external tools. Where agents can install new capabilities during active missions, the control should also include revocation on completion, because standing privilege for autonomous systems creates the same blind spot that attackers exploit in compromised NHIs.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A05	Agent skill installs are a supply-chain and execution-control risk.
CSA MAESTRO	T4	MAESTRO covers agent runtime governance and tool-use restrictions.
NIST AI RMF	GOVERN	AI RMF governance is needed for ownership and approval of agent installs.

Block untrusted agent extensions before execution and require inspection at the install boundary.

How should security teams stop AI agents from installing malicious skills?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group