Who is accountable when AI systems are used in a cyber attack chain?

Why This Matters for Security Teams

When AI is inserted into an attack chain, the accountability question is not academic. The real issue is which organisation controlled the identity, secrets, approvals, and tool access that let the system act. That makes this a governance and access problem first, and a model-risk problem second. Once an AI system can call tools, read data, or move laterally, ordinary assumptions about “the model did it” stop helping.

Current guidance from NHI practitioners is that accountability follows the operator of the delegated access, not the behaviour of the model alone. That is why the 52 NHI Breaches Analysis and the Ultimate Guide to NHIs — Key Challenges and Risks both emphasise identity sprawl, secret exposure, and weak ownership as recurring failure modes. External reporting on AI-enabled intrusion also shows how quickly adversaries exploit exposed credentials, including the Anthropic report on the first AI-orchestrated cyber espionage campaign, which demonstrates how autonomous tooling changes attacker speed and scale.

In practice, many security teams discover ownership gaps only after an AI-connected identity has already been used to touch production systems or exfiltrate data.

How It Works in Practice

Accountability becomes traceable only when the AI system is treated as a workload with explicit identity, not as a vague extension of a user. The operator must define who approved the connection, which secrets or tokens were issued, what actions the agent may take, and what logging proves the boundary was respected. That is the operating model reflected in NHIMG’s Top 10 NHI Issues and in the OWASP NHI Top 10, where credential ownership, privilege boundaries, and revocation are central.

For cyber attack chains involving agents, the practical controls usually include:

Assigning a named business owner and a technical owner for every agent, API key, and service account.

Issuing short-lived credentials and revoking them automatically when the task completes.

Using approval gates for high-risk actions such as credential export, mailbox access, or infrastructure changes.

Logging tool calls, prompts, outputs, and downstream actions so the chain of custody is reconstructable.

Offboarding agent identities the same way human joiner-mover-leaver events are handled, with no orphaned access.

That model aligns with current threat reporting from the CISA cyber threat advisories, which consistently show that compromised access paths, not just malware, drive material impact. It also fits the logic in MITRE ATLAS adversarial AI threat matrix, where the attack surface includes the systems that enable AI action, not only the model itself. These controls tend to break down when agents inherit broad human privileges, because the blast radius then exceeds what any approval workflow can realistically supervise.

Common Variations and Edge Cases

Tighter control over AI access often increases operational friction, so organisations have to balance speed against evidentiary clarity. That tradeoff is unavoidable in environments where agents are used for security operations, incident response, or developer automation.

There is no universal standard for accountability in multi-tenant agent stacks yet, but current guidance suggests three common edge cases matter most. First, if a third-party platform hosts the agent, the platform operator and the customer may both retain partial responsibility depending on who controlled the secrets and policy. Second, if an attacker steals a token and repurposes it, accountability still tracks back to the organisation that failed to scope, rotate, or revoke the credential. Third, when a human approves an action and the agent executes a broader chain than intended, the issue is usually inadequate guardrails, not model intent.

The DeepSeek breach illustrates why exposed secrets and large-scale credential leakage turn accountability into a systemic issue, not a single-user mistake. Security leaders should therefore document control ownership, preserve logs, and define escalation paths before an AI system is allowed to touch sensitive infrastructure. When agent behaviour is distributed across multiple vendors and nested tools, attribution becomes harder because no single actor sees the full chain.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A01	Covers agent abuse, unsafe autonomy, and weak action boundaries in cyber attack chains.
CSA MAESTRO	GOV	Governance and ownership are central when AI systems operate with delegated access.
NIST AI RMF		AI RMF focuses on accountability and managed risk for autonomous AI use.

Document responsibility, oversight, and incident escalation for every AI-enabled access path.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Who is accountable when AI systems are used in a cyber attack chain?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group