AI agents can run purple team exercises, but identity state still breaks

By NHI Mgmt Group Editorial TeamPublished 2026-02-11Domain: Agentic AI & NHIsSource: Permiso Security

TL;DR: An AI agent executed a Scattered Spider style purple team exercise in AWS, created a new IAM user, attached administrator privileges, generated access keys, and triggered multiple detections within minutes, according to Permiso Security. The bigger lesson is that autonomous execution can outpace identity-state continuity, so access review and identity-switching assumptions need rethinking.

At a glance

What this is: Permiso Security describes an AI agent that successfully emulated attacker techniques in AWS during a purple team exercise, while also exposing a state-tracking failure in identity handling.

Why it matters: IAM and security teams should treat AI agents as operational identities whose behaviour can cross from detection validation into privilege and accountability problems across NHI, autonomous, and human identity programmes.

By the numbers:

The agent scanned over 2,500 skills across marketplaces, confirmed 21 threats, and built 16 custom skills rather than downloading pre-made ones from repositories.
Within twelve days, Rufio authored 135 YARA rules for detecting malicious agent skills.

👉 Read Permiso Security's analysis of AI agent purple teaming and AWS identity state

Context

AI agent purple teaming is the use of an autonomous system to emulate attack techniques so defenders can validate detections and response coverage. In this post, the primary identity question is not whether the agent can act, but whether identity state, privilege switching, and accountability still hold when the actor makes runtime decisions.

That matters for NHI governance because the exercise sits at the boundary between machine identity, delegated human identity, and autonomous execution. When an agent can create users, request privileges, and continue acting inside the same session, traditional assumptions about stable identity context become much harder to rely on.

Key questions

Q: How should security teams govern AI agents in purple team exercises?

A: Treat the agent as a governed identity, not a script. Give it a narrow scope, explicit session boundaries, full control-plane logging, and clear identity handoff requirements if the exercise includes creating or switching identities. That keeps the test focused on detection validation while preventing autonomous behaviour from turning into uncontrolled persistence.

Q: Why do AI agents complicate identity attribution in cloud environments?

A: Because an agent can create or use multiple identities inside one task while logs still show a single initiating session. That makes it harder to tell which actor owns later actions, especially when federated access, local IAM users, and long-term keys coexist. Correlation across session types is therefore essential.

Q: What breaks when autonomous agents do not switch to the identity they created?

A: The exercise loses identity fidelity. If the agent creates a new user but continues acting as the original federated identity, you cannot accurately test persistence, attribution, or offboarding logic. The right control is to verify identity transition, not just successful command execution.

Q: Who is accountable when an autonomous agent generates privileged access in AWS?

A: Accountability sits with the team that scoped, approved, and monitored the agent, because the agent cannot own policy or governance decisions. In practice, cloud identity teams need audit trails that show who authorised the task, which identity executed each step, and where the privilege escalation occurred.

Technical breakdown

Autonomous purple teaming and runtime identity selection

A purple team exercise normally assumes a human operator or scripted workflow that follows a known sequence. Rufio changed that pattern by translating a threat narrative into live AWS actions, which means the controlling identity was not just executing commands but deciding how to proceed at runtime. That is an autonomous behaviour problem, not simply an automation problem. Once an agent can choose actions, sequence, and timing without human approval between steps, the governance model has to account for the identity that is actually acting, not the one that was initially granted access.

Practical implication: treat autonomous test agents as governed identities with explicit scope, logging, and revocation rules before they touch production-like environments.

IAM user creation, access keys, and session attribution

The exercise followed a classic cloud identity attack pattern: create a local IAM user, attach administrator privileges, generate programmatic credentials, and keep operating through multiple session types. The important technical detail is that session attribution can remain tied to the federated identity even after the new local identity exists, which makes later activity harder to reason about if state transitions are not tracked cleanly. In cloud environments, that mix of federated access, local persistence, and long-term keys is what turns a controlled test into a realistic identity risk model.

Practical implication: correlate federated sessions, local IAM creation, and key issuance in one timeline so identity transitions cannot hide behind a single actor label.

Detection correlation across cloud control-plane activity

Permiso's platform did not just flag isolated events. It correlated user creation, policy attachment, key generation, console access attempts, and CloudShell credential harvesting into a compound alert. That matters because a single event in cloud identity often looks benign on its own, while the sequence reveals persistence and privilege escalation intent. The technical lesson is that detection value comes from stitching control-plane actions together fast enough to preserve the attack narrative. Without that correlation, the same events would be easy to misread as routine administration.

Practical implication: build detection logic that reconstructs identity-action sequences, not just single API calls or isolated policy changes.

NHI Mgmt Group analysis

Autonomous purple teaming does not just test detections, it tests whether identity governance can still track who is acting. When an agent can translate a threat narrative into live cloud actions, the control question shifts from "did we detect it" to "did we preserve identity continuity across the action chain." That is why autonomous test agents need the same governance seriousness as other operational identities. Practitioner conclusion: build identity-aware guardrails before experimentation expands into production-adjacent workflows.

Identity state continuity is the named failure mode this exercise exposes. The exercise showed that an agent could create a new IAM identity and still keep operating through the original federated session, which breaks the assumption that the actor using privileges is the same actor that created them. That assumption was designed for human-paced workflows and stable session ownership. It fails when runtime execution spans multiple identities inside one autonomous task. Practitioner conclusion: re-evaluate how your programme binds action, session, and accountability together.

Compound cloud actions are where machine identity governance becomes measurable. Single control-plane events are too thin to explain autonomous behaviour, but the chain of user creation, admin attachment, access-key generation, and credential harvesting exposes a governable sequence. OWASP-NHI and ZT-NIST-207 both matter here because the issue is not just privilege, but the ability to trace privilege as it changes form. Practitioner conclusion: if your logs cannot reconstruct identity transitions, your governance model is already behind the agent.

Runtime delegation without identity handoff is the boundary this story crosses. The agent followed the mission, but not the intended identity switch, which shows that delegation can succeed while accountability fails. That is a structural warning for agentic AI programmes and for NHI programmes that assume one actor equals one session. Practitioner conclusion: review whether your operating model can distinguish delegated action from delegated identity before scaling autonomous workflows.

The market signal is clear: defensive teams will increasingly use autonomous agents to test cloud identity assumptions. That does not eliminate the need for human expertise, but it changes where humans should focus, namely on scope design, identity transition validation, and detection correlation. The relevant framework lens is NIST-CSF for detection and response, with OWASP-AGENTIC for the behaviour of the actor itself. Practitioner conclusion: plan for agents as part of the testing stack, but govern them as identities, not tools.

From our research:
88.5% of organisations acknowledge that their non-human IAM practices lag behind or are merely on par with their human identity and access management efforts, according to The 2024 Non-Human Identity Security Report.
23.5% of security professionals are unsure about the biggest threat to their non-human identities, indicating a significant awareness gap.
That governance gap becomes sharper when you compare it with The 52 NHI breaches Report, which shows how identity blind spots repeatedly become breach pathways.

What this signals

Identity state continuity is the operational issue practitioners should watch next. As autonomous agents move from testing into broader security operations, the main risk is not simply that they act, but that they act across multiple identity states faster than governance workflows can reconcile.

With 35.6% of organisations citing consistent access across hybrid and multi-cloud environments as their top NHI security challenge, per The 2024 Non-Human Identity Security Report, autonomous exercises will expose the same weak point from a different direction.

Teams that already use federated identity, local cloud identities, and long-lived secrets in the same environment should expect agent-driven testing to surface attribution failures first. That makes cloud logging, identity correlation, and scoped rollback the next operational priorities.

For practitioners

Define identity handoff rules for autonomous test agents Require the agent to switch to the newly created identity when the task calls for it, and verify that downstream actions are executed only under that identity. Preserve evidence of each handoff so the exercise tests governance as well as detections.
Correlate federated sessions with local IAM creation Join Okta federation logs, IAM user creation, access-key issuance, and policy attachment into one reviewable sequence. That timeline is what shows whether persistence was created and whether the wrong identity kept operating after escalation.
Alert on privilege escalation followed by long-term key creation Treat administrator policy attachment plus new access-key generation as a compound signal, not two unrelated admin events. The combination is what turns a test into a persistence problem.
Limit autonomous exercises to pre-scoped cloud sandboxes Use isolated environments with explicit logging, short-lived credentials, and predefined rollback so agent experimentation cannot leak into operational accounts. The point is to measure detection coverage without creating real persistence risk.

Key takeaways

This exercise shows that AI agents can execute realistic cloud attack emulation, but it also exposes how easily identity continuity can break when tasks span multiple sessions and identities.
The evidence is concrete: the agent created a local IAM user, attached admin rights, generated access keys, and triggered correlated detections within minutes.
Teams should validate identity handoff, session correlation, and scoped sandboxing before expanding autonomous agents into purple team or security operations workflows.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	AGENT-03	The article centers on autonomous agent behaviour and tool-use in cloud operations.
OWASP Non-Human Identity Top 10	NHI-03	The exercise created and used cloud identities, keys, and persistence mechanisms.
NIST CSF 2.0	DE.CM-1	The post is about detection coverage and correlated cloud telemetry.

Track local identity creation and key issuance as NHI events, then correlate them with session attribution.

Key terms

Autonomous Agent: A software entity that can choose actions, tools, and timing at runtime without a human approving each step. In identity security, that means the agent may create, use, or abandon credentials during the same task, so governance must track behaviour rather than assume a fixed script.
Identity State Continuity: The ability to preserve a reliable link between the actor, the session, and the privileges used across a workflow. When this breaks, logs may still show activity, but they no longer prove which identity should be held accountable for each action. That is a major problem in cloud and agentic operations.
Federated Session: An authenticated session issued through an external identity provider and then used to access another platform, such as a cloud provider. It is useful for access control, but it can obscure later identity transitions if new local identities or long-term keys are created during the same activity chain.
Control-Plane Correlation: The practice of stitching together identity and administration events across a cloud platform into one sequence. It is essential because the security meaning often appears only when user creation, privilege changes, and credential generation are viewed together rather than as separate alerts.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Permiso Security: Can an AI Agent Run a Purple Team Exercise? Hear Ye, Hear Ye. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-02-11.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org