Who is accountable when an autonomous agent takes an unsafe action?

Why This Matters for Security Teams

When an autonomous agent takes an unsafe action, the real issue is not just “who clicked approve” but who allowed a goal-driven system to act with insufficient constraints. That is why incident review must focus on ownership, policy design, and runtime controls, not only on the action itself. The risk is amplified by the way agents chain tools and move beyond intended scope, a pattern highlighted in AI Agents: The New Attack Surface report and the OWASP Agentic AI Top 10.

Security teams often inherit an answer after the fact, but accountability has to be defined before deployment: who owns the agent, who approves its policy, who operates the downstream system, and who can revoke access when behaviour changes. This is where guidance from the NIST AI Risk Management Framework becomes practical, because governance is only meaningful when it maps to real operational controls. In practice, many teams discover gaps only after an agent has already accessed the wrong system or exposed data, rather than through intentional control testing.

How It Works in Practice

The safest operating model is to treat the agent as a privileged workload with a defined mission, then constrain it with explicit policy and short-lived access. Static RBAC alone is usually too blunt for autonomous behaviour, because the agent does not follow a fixed human workflow. Current practice is moving toward intent-based authorisation, where a policy engine decides at request time whether the agent’s stated task, data context, and destination system are acceptable. That aligns with the threat modelling direction in CSA MAESTRO agentic AI threat modeling framework.

Use workload identity for the agent, not shared human credentials.

Issue JIT credentials or ephemeral secrets per task, then revoke them automatically on completion.

Evaluate policy at runtime with full context, using policy-as-code rather than fixed role assumptions.

Log every tool call, secret access, and downstream write action for audit and rollback.

Separate policy approval from operational ownership so no single team can self-authorise risky behaviour.

This is especially important because NHI risk is already widespread: the Ultimate Guide to NHIs — 2025 Outlook and Predictions notes that 97% of NHIs carry excessive privileges, which makes unsafe autonomous behaviour far harder to contain once it begins. In practice, these controls tend to break down when an agent is given broad tool access across legacy systems, because the runtime policy layer cannot compensate for poorly segmented downstream permissions.

Common Variations and Edge Cases

Tighter control often increases integration overhead, requiring organisations to balance speed of agent deployment against the cost of policy engineering and audit coverage. That tradeoff matters most in environments where agents interact with multiple teams, vendor APIs, or production data. Best practice is evolving, but there is no universal standard yet for how much autonomy should be delegated before human approval is mandatory.

For low-risk tasks, some organisations allow pre-approved action classes with automatic execution, while high-risk actions still require human sign-off. For regulated data, the safer pattern is to use short-lived credentials and strong observability, then pair that with the OWASP Top 10 for Agentic Applications 2026 and NIST AI Risk Management Framework for governance structure. The key exception is when the downstream system itself is the weak point: even a well-governed agent can still trigger harm if the target system lacks segregation, approval gates, or meaningful rollback. That is why accountability must extend beyond the model team to the system owner and the approver of the policy. Current guidance suggests the safest interpretation is shared accountability with clear decision rights, not vague collective ownership.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Defines agentic risks like unsafe tool use and scope overreach.
CSA MAESTRO		Covers threat modeling for autonomous agents and tool-mediated actions.
NIST AI RMF		Provides governance structure for accountability and risk management.

Map each agent action class to risk controls and require runtime checks before execution.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Who is accountable when an autonomous agent takes an unsafe action?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group