Computer use agents are exposing a new identity control gap

By NHI Mgmt Group Editorial TeamPublished 2025-07-30Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: Anthropic’s Computer Use and OpenAI’s Computer Using Agent show how AI can interact with desktops and browsers to complete multi-step work, but the article also highlights performance, safety, and control tradeoffs across managed virtual environments and direct machine access. Existing IAM and NHI models were built for predefined access, not runtime action selection across arbitrary software.

At a glance

What this is: This is an independent analysis of AI computer-use agents and the identity control gap they create as models move from text generation into desktop and browser actions.

Why it matters: It matters because practitioners must decide how to govern agent access, tool scope, and approval boundaries across NHI, autonomous, and human workflows before these systems inherit real operational privilege.

👉 Read WorkOS's analysis of Anthropic Computer Use versus OpenAI CUA

Context

Computer use agents are AI systems that can observe a screen, interact with software, and complete tasks across applications. The governance gap is that current identity models assume access is provisioned for a known purpose, while these agents can decide actions dynamically inside live sessions.

The article contrasts direct desktop control with browser-contained execution, which changes the trust boundary for identity teams. That difference matters for NHI governance, access containment, and the point at which an agent should be treated as a privileged execution subject rather than a simple automation layer.

Key questions

Q: How should security teams govern AI agents that can use desktops and browsers?

A: Security teams should govern these agents as privileged runtime actors, not as ordinary automation. Separate browser-contained workflows from desktop-level workflows, require session logging, and tie each allowed action path to a task-specific policy. If an agent can change tools and actions in response to live context, access review must focus on reachable surfaces, not just assigned roles.

Q: Why do computer-use agents complicate least-privilege design?

A: They complicate least privilege because the useful scope of access is often discovered only during execution. A task may require the agent to move across apps, read screens, and branch based on what it sees. That makes provisioning-time privilege definitions incomplete unless teams also constrain the session path, the environment, and the approval boundary.

Q: What breaks when AI agents can self-correct during task execution?

A: Fixed workflow assumptions break first. If an agent can retry, branch, or replan after an error, then the control plane can no longer assume a single approved action sequence. Monitoring must therefore watch for unexpected retries, tool changes, and expansion into new applications, because the risk is in the evolving path, not only in the original request.

Q: How do you know if computer-use governance is actually working?

A: You know it is working when the agent stays inside its intended surface, produces complete audit trails, and cannot move into unapproved applications without a policy event. Effective governance shows up as narrow session scope, visible action logs, and blocked cross-boundary behaviour rather than simply successful task completion.

Technical breakdown

Desktop control versus browser-contained execution

Anthropic’s Computer Use model operates closer to a human operator on a machine. It reads screenshots, interprets the interface, and drives mouse and keyboard actions across native apps, terminals, and websites. OpenAI’s Computer Using Agent is narrower, running through a managed virtual browser that constrains the execution surface to web tasks. From an identity perspective, the distinction is not cosmetic. Desktop control expands the reachable asset set, while browser containment reduces blast radius but can still expose sensitive sessions, tokens, and workflows if the agent is over-scoped.

Practical implication: separate governance for browser-bounded agents and desktop-bounded agents, because the access surface and containment model are not equivalent.

Why screenshot-driven agents change authorisation assumptions

These systems do not rely on fixed API calls alone. They infer state from the visible interface, then select actions based on what they see in the moment. That means the privilege decision is happening inside the session, not only at provisioning time. For identity teams, the security question shifts from whether access was granted to whether the agent can discover, navigate, and combine paths that were not anticipated when access was issued. This is why conventional least-privilege design can look adequate on paper and still fail under live UI-driven execution.

Practical implication: treat UI-driven agent actions as runtime authorisation events and instrument them with policy and audit controls.

Runtime autonomy and the agentic identity boundary

The article describes systems that can take multi-step tasks, self-correct, and continue execution after errors. That is more than scripted automation, because the model is choosing the next action from evolving context rather than following a fixed workflow. Once the agent can decide how to proceed in response to changing screens or failures, identity governance must consider it a decision-making subject with scoped authority, not just a tool consumer. That boundary matters for approvals, monitoring, and accountability.

Practical implication: classify multi-step computer-use systems by their decision authority, not by whether they are marketed as assistants or automations.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Runtime access review was designed for stable privilege, and that assumption fails when a computer-use agent is selecting actions inside the session. These systems can observe state, decide a next step, and keep moving without a human approval gate between actions. The implication is that review cadences built for persistent entitlements do not describe the real control problem anymore; they miss the moment where privilege is assembled and used.

Desktop-bound agents create a broader identity blast radius than browser-contained agents. A browser sandbox confines many actions to managed web surfaces, while direct desktop control can reach local files, native applications, terminals, and system-level workflows. That changes the control model from application access to environment access. Practitioners should treat the desktop boundary as a governance boundary, not just a user-experience choice.

Computer-use agents sharpen the difference between automation and autonomy. Scripted RPA follows predetermined steps, but these agents can interpret context and choose subsequent actions when the interface changes. That means policy must account for runtime discretion, not just scheduled execution. The practitioner conclusion is simple: if an AI can redirect itself mid-task, the identity model must account for that freedom explicitly.

Identity teams need a named concept for this problem: runtime interaction privilege. These agents are not merely holding credentials, they are exercising authority through live interface navigation in ways that are difficult to predict at provisioning time. The governance issue is not only access possession, but the ability to combine visible state and granted scope into actions that were never enumerated in advance. That is where traditional entitlement thinking starts to break down.

Computer-use governance will increasingly sit at the intersection of NHI and autonomous control. As these agents move beyond narrow browser tasks, the same access subject can behave like an NHI in one context and an autonomous actor in another. That overlap forces identity teams to align lifecycle, monitoring, and approval models across machine identities and emerging agentic workloads. The practitioner takeaway is to avoid a single control pattern for all agentic systems.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
That fragmentation matters because organisations maintain an average of 6 distinct secrets manager instances, according to The State of Secrets in AppSec, which makes session-level control harder to prove.

What this signals

Runtime interaction privilege: computer-use agents show why identity teams need a new way to describe authority that emerges during a live session. When a model can inspect the screen, choose the next action, and continue across applications, the control problem shifts from provisioning to runtime containment. That is why browser-first containment and desktop-level access should never be governed with the same policy template.

Programmes that already struggle with secret sprawl will find agentic workflows harder to stabilise. With 43% of security professionals concerned about AI systems learning and reproducing sensitive information patterns from codebases, the operational issue is not only leakage but reuse of exposed context inside active sessions. Teams should align these controls with the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework, especially where tool use crosses application boundaries.

For practitioners

Define separate control zones for browser and desktop agents Map each computer-use workflow to the narrowest possible execution surface. Browser-only tasks should remain contained in managed virtual environments, while desktop access should require stronger containment, tighter session logging, and explicit business justification.
Treat every agent action as a runtime authorisation event Capture screenshots, action sequences, and tool invocations so security teams can reconstruct how the agent moved through each session. Tie those records to policy decisions, not only to final task outcomes.
Apply privileged access review to agent-scoped workflows Review which applications, files, and terminals a computer-use agent can reach, then remove any path that is not required for the task. Focus the review on live session reachability rather than static entitlement lists.
Separate autonomous tasks from scripted automation Classify workflows that can self-correct, branch, or continue after error as higher-risk than fixed RPA jobs. Use that classification to determine whether human approval is required before the next action set begins.
Log and alert on cross-application movement by agents Flag agent sessions that move from one application to another without a documented task reason, especially when the path crosses from web UI into local desktop apps or file systems.

Key takeaways

Computer-use agents turn identity from a static access problem into a live session control problem.
Browser-contained and desktop-level agents expose different trust boundaries, so they need different governance models.
Practitioners should focus on runtime authorisation, session scope, and auditability before expanding agent access to production workflows.

Key terms

Computer-use agent: An AI system that can observe a user interface and take actions across software on behalf of a task. In practice, it extends identity governance beyond API access because the agent can navigate live applications, combine steps, and adapt to changing state during the session.
Runtime interaction privilege: Authority exercised by an agent while it is actively interpreting and operating a user interface. The term captures the security reality that access is not just what was provisioned, but what the system can discover and do in the live execution path.
Browser-contained execution: A control model that keeps an agent inside a managed web environment instead of letting it operate a full desktop. It limits the reachable surface, but it still requires strong policy, logging, and task scoping because web actions can expose credentials and sensitive workflows.
Desktop-level access: The ability for an AI system to interact with local applications, files, terminals, and operating system controls. This expands capability, but it also increases identity blast radius because the agent can touch assets that are outside a browser sandbox and harder to constrain.

Deepen your knowledge

Agentic AI governance and runtime access control are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for computer-use agents or other autonomous workflows, it is a practical place to start.

This post draws on content published by WorkOS: Anthropic’s Computer Use versus OpenAI’s Computer Using Agent (CUA). Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-30.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org