AI agent access controls in AWS outages: what IAM teams missed

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 10/06/2026 12:55 am

TL;DR: AWS outages traced to internal AI coding tools showed that AI agents given human-level permissions can delete and recreate environments without adequate approval gates, causing 13-hour and 15-hour disruptions, according to AuthMind reporting on Financial Times and TechRadar coverage. Identity governance now has to account for machine-speed execution, not just human workflows.

NHIMG editorial — based on content published by AuthMind: LLMjacking and AWS outage analysis tied to compromised NHI access

By the numbers:

According to Entro Labs' NHI and Secrets Risk Report for H1 2025, non-human identities already outnumber human users by 144 to 1, up from 92 to 1 just a year prior.
According to Entro Labs' NHI and Secrets Risk Report for H1 2025, 8.7% of NHIs are overprivileged and idle.
According to Entro Labs' NHI and Secrets Risk Report for H1 2025, over 5.5% of AWS machine identities hold full administrator privileges, often by default rather than by design.

Questions worth separating out

Q: What breaks when AI agents get the same cloud permissions as human operators?

A: Production change control breaks first, because the agent can execute privileged actions at machine speed without the human pacing that approval workflows assume.

Q: Why do non-human identities create more outage risk in cloud environments?

A: Non-human identities create more outage risk because they often hold broad, persistent permissions across API connections, service accounts, and orchestration tools.

Q: How do security teams know whether identity observability is working?

A: Identity observability is working when responders can identify the acting credential, the affected services, and the likely blast radius within minutes, not hours.

Practitioner guidance

Separate agent permissions from human change rights Do not let AI coding tools inherit the same infrastructure permissions as human operators.
Inventory every NHI involved in production workflows Map service accounts, API keys, OAuth tokens, and agent credentials to the exact systems they can touch.
Baseline agent behaviour before the next incident Define normal call patterns, permitted resource types, and expected change sequences for each high-risk identity.

What's in the full article

AuthMind's full article covers the operational detail this post intentionally leaves for the source:

The incident timeline for the December 2025 and October 2025 AWS outages, including how the environment changes unfolded.
The article's account of which internal AI coding tools were involved and how oversight failed during execution.
The source discussion of identity observability as a response model for tracing machine-to-machine blast radius.
The practical inventory and monitoring steps AuthMind recommends for service accounts, API keys, and agent credentials.

👉 Read AuthMind's analysis of AWS outages tied to internal AI coding tools →

AI agent access controls in AWS outages: what IAM teams missed?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

11/06/2026 2:31 am

Machine-speed privilege is the real failure here, not AI capability. The outages described in this article happened because non-human identities were granted access at a level that assumed human pacing and human review. That is an identity governance failure, not a tooling flaw. When an agent can exercise permissions immediately and without approval gates, the control model is already misaligned with the actor. The practitioner conclusion is straightforward: treat machine-speed access as a separate governance class.

A few things that frame the scale:

According to the Ultimate Guide to NHIs, non-human identities already outnumber human users by 25x to 50x in modern enterprises.
The same research shows that only 5.7% of organisations have full visibility into their service accounts, which explains why incident reconstruction so often starts from uncertainty.

A question worth separating out:

Q: Who is accountable when an AI agent causes a production outage?

A: Accountability sits with the organisation that granted the access and defined the approval model, not with the tool itself. If an AI agent can make destructive changes without the same governance constraints as a human operator, the control design failed before the outage began.

👉 Read our full editorial: AI agent access controls are failing at machine speed in AWS

ReplyQuote

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

12/06/2026 4:05 am

Machine-speed privilege is the real failure here, not AI capability. The outages described in this article happened because non-human identities were granted access at a level that assumed human pacing and human review. That is an identity governance failure, not a tooling flaw. When an agent can exercise permissions immediately and without approval gates, the control model is already misaligned with the actor. The practitioner conclusion is straightforward: treat machine-speed access as a separate governance class.

A few things that frame the scale:

According to the Ultimate Guide to NHIs, non-human identities already outnumber human users by 25x to 50x in modern enterprises.
The same research shows that only 5.7% of organisations have full visibility into their service accounts, which explains why incident reconstruction so often starts from uncertainty.

A question worth separating out:

Q: Who is accountable when an AI agent causes a production outage?

A: Accountability sits with the organisation that granted the access and defined the approval model, not with the tool itself. If an AI agent can make destructive changes without the same governance constraints as a human operator, the control design failed before the outage began.

👉 Read our full editorial: AI agent access controls are failing at machine speed in AWS

ReplyQuote