Put a mandatory inspection and block step at the install boundary, not inside the skill itself. The control must evaluate package metadata, embedded scripts, and suspicious behaviour before execution begins. If the model can skip the check, the check is advisory, not enforcement. Treat agent-installed code as a governed identity action, with ownership and approval rules attached.
Why This Matters for Security Teams
Malicious skill installation is not just a software supply chain problem. For AI agents, it is an identity and authorisation problem because the agent can decide what to install, when to install it, and which tools to invoke next. If the install step is not enforced before execution, an attacker only needs one successful prompt, one poisoned repository, or one compromised dependency to turn the agent into an execution path. Guidance from the OWASP Agentic AI Top 10 and NHI research from OWASP NHI Top 10 both point to the same operational reality: autonomous workloads need runtime controls, not trust in the model’s judgement.
The risk grows because agents can chain actions across packages, scripts, API calls, and secrets in ways that static reviews do not predict. That means malicious skills can arrive through metadata abuse, hidden bootstrap code, or social engineering aimed at the agent itself. In practice, many security teams encounter agent-driven compromise only after the first bad skill has already executed and established persistence.
How It Works in Practice
The practical control is a mandatory install boundary that evaluates the requested skill before any code, plugin, or workflow is activated. Security teams should treat the install request as a governed identity action: who approved it, which agent requested it, what task context justified it, and whether the package is allowed to run in that environment. This maps well to the runtime, context-aware direction described in the NIST AI Risk Management Framework and the agent-focused recommendations in CSA MAESTRO agentic AI threat modeling framework.
Effective inspection usually combines several checks:
- Package metadata review for source trust, maintainer anomalies, version drift, and namespace squatting.
- Static analysis of embedded scripts, post-install hooks, and bootstrap files before execution begins.
- Behavioural policy checks for high-risk actions such as file writes, network egress, credential access, or tool chaining.
- Just-in-time approval for sensitive installs, with short-lived permissions that expire after the task completes.
- Workload identity and policy enforcement at the agent boundary so the decision is tied to the agent instance, not just a user session.
Current guidance suggests that the strongest pattern is to keep the policy outside the skill itself, ideally in a central policy engine that can evaluate the request in real time. That aligns with the operational lessons in Analysis of Claude Code Security and the credential abuse patterns highlighted in AI LLM hijack breach. These controls tend to break down when the agent can self-update, pull from unvetted registries, or execute arbitrary post-install scripts because the malicious payload can run before any inspection result is enforced.
Common Variations and Edge Cases
Tighter install gating often increases developer friction and can slow legitimate agent workflows, so organisations need to balance safety against task latency and operational exception handling. There is no universal standard for this yet, but the best practice is evolving toward per-task authorisation, short TTL credentials, and explicit ownership for every install decision.
One common edge case is delegated autonomy. If an upstream orchestrator approves a skill once and downstream agents can reuse it indefinitely, the approval model becomes stale almost immediately. Another is third-party skill ecosystems, where package integrity is necessary but not sufficient because a signed package can still behave maliciously after installation. The The State of Non-Human Identity Security research shows how often organisations struggle with visibility and control across non-human access paths, which is why agent installs should be logged as identity events, not treated as ordinary software updates.
For high-risk environments, security teams should prefer deny-by-default policies, sandboxed execution, and separate approval paths for skills that can read secrets, move laterally, or call external tools. Where agents can install new capabilities during active missions, the control should also include revocation on completion, because standing privilege for autonomous systems creates the same blind spot that attackers exploit in compromised NHIs.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A05 | Agent skill installs are a supply-chain and execution-control risk. |
| CSA MAESTRO | T4 | MAESTRO covers agent runtime governance and tool-use restrictions. |
| NIST AI RMF | GOVERN | AI RMF governance is needed for ownership and approval of agent installs. |
Block untrusted agent extensions before execution and require inspection at the install boundary.
Related resources from NHI Mgmt Group
- How should security teams manage permissions for AI agents?
- How should security teams govern AI agents that use OAuth access?
- How should security teams limit the risk from AI agents that have access to production systems?
- How should security teams govern AI agents that can access enterprise systems?