Who is accountable when an AI agent installs a malicious skill?

Accountability should sit with the organisation that defined the agent’s permissions, selection criteria, and install controls. If a human owner is not assigned, or if the agent can install code without enforceable policy, accountability becomes ambiguous. For regulated environments, that weakens evidence of governance and makes post-incident review harder to defend.

Why This Matters for Security Teams

When an AI agent installs a malicious skill, the security failure is rarely just the code itself. The real issue is who allowed the agent to evaluate, fetch, and execute untrusted capabilities in the first place. That makes this a governance and identity problem, not only an application security problem. Current guidance from the OWASP Agentic AI Top 10 and NIST’s NIST AI Risk Management Framework both point to the same operational concern: autonomous systems can take actions that were never explicitly reviewed at human speed.

NHIMG research on agent risk shows why accountability becomes blurry fast. In AI Agents: The New Attack Surface report, 80% of organisations reported AI agents had already performed actions beyond intended scope, yet only 44% had implemented any policies to govern them. If the organisation cannot prove who approved install rights, selection criteria, and revocation paths, post-incident review becomes a narrative exercise instead of evidence-based accountability. In practice, many security teams discover this gap only after the agent has already chained a tool install into a broader compromise.

How It Works in Practice

Accountability for a malicious skill installation should be mapped to the control owner, the policy owner, and the operational owner of the agent. In other words, if the organisation defined the agent’s permissions, what it may install, and when it must ask for approval, the organisation owns the outcome. That does not mean a single person is blamed for every bad action. It means the system of controls must be traceable enough to answer who authorised the action path, who reviewed the risk, and who can revoke the capability.

Practically, strong programs separate three things:

Who can grant install authority to the agent.
What sources, packages, or skills are trusted at runtime.
What evidence is retained when the agent attempts an install.

That evidence often includes workload identity, short-lived credentials, and policy decision logs. For autonomous systems, static RBAC alone is usually too blunt because the agent’s intent changes task by task. Best practice is evolving toward runtime authorisation, where the decision to install a skill is evaluated in context using policy-as-code, not just a pre-set role. For implementation patterns, security teams increasingly reference the CSA MAESTRO agentic AI threat modeling framework and the OWASP NHI Top 10 to treat skill installation as a privileged action that requires explicit controls, logging, and revocation.

In well-governed environments, the agent should use ephemeral, task-scoped access, and the install action should be blocked unless the policy engine can verify the source, context, and intended effect. These controls tend to break down when agents can self-modify inside loosely governed plugin ecosystems because the install path itself becomes the attack path.

Common Variations and Edge Cases

Tighter install controls often increase friction, requiring organisations to balance autonomy against review overhead. That tradeoff is real, especially when business teams want agents to adapt quickly while security teams need defensible evidence. There is no universal standard for this yet, so current guidance suggests a layered model: baseline allowlists, human approval for new skill categories, and automatic revocation for anything outside policy.

Edge cases matter. In a delegated model, a business owner may approve the agent’s function while security owns the guardrails. In a federated model, a platform team may own the runtime and the line-of-business owner may own use-case risk. Accountability should follow the control boundary, not whichever team notices the incident first. That is why NIST AI RMF guidance and the NIST AI Risk Management Framework are useful: they push organisations to assign governance, map risk, and keep decision records.

Where this breaks down most often is with agent marketplaces, unmanaged skill registries, and shadow AI deployments that bypass central approval. NHIMG’s The State of Secrets in AppSec underscores how fragmented control already weakens remediation and oversight, and that same pattern appears when agents inherit broad install rights without a clearly named owner.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Malicious skill installs are a core agentic application abuse path.
CSA MAESTRO	GOV-1	MAESTRO covers governance and control boundaries for autonomous agent actions.
NIST AI RMF		AI RMF frames accountability, traceability, and risk ownership for AI systems.

Treat skill installation as a privileged agent action and require policy checks before execution.

Who is accountable when an AI agent installs a malicious skill?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group