Who is accountable when an AI agent completes a browser workflow incorrectly?

Why This Matters for Security Teams

When an AI agent completes a browser workflow incorrectly, the failure is rarely just a UI mistake. It is an authorization and accountability problem: the agent was allowed to act, the task boundary was unclear, or the session controls were too broad. Current guidance suggests treating browser-driving agents as autonomous workloads with operational side effects, not as enhanced macros. That distinction matters because agent actions can touch customer data, trigger transactions, or expose secrets faster than a human can intervene.

Security teams often discover the problem only after the wrong record is updated, the wrong file is downloaded, or a sensitive page is submitted with valid credentials. NHI governance research on the OWASP NHI Top 10 and the NIST AI Risk Management Framework both point to the same operational reality: if ownership, context, and controls are not explicit, the risk lands on the enterprise, not the model. In practice, many security teams encounter agent misuse only after a business process has already been executed incorrectly, rather than through intentional testing of the workflow boundary.

How It Works in Practice

Accountability for a faulty browser workflow should be assigned across three layers: the business owner who defined the task, the system owner who exposed the browser or tool access, and the identity team that approved the agent’s access model. That distribution is practical because the failure can originate in task design, privilege design, or session design. A browser agent may have valid credentials and still do the wrong thing if the allowed intent was too vague.

For autonomous agents, static role-based access is usually too blunt. The better pattern is context-aware authorization at runtime, paired with just-in-time credentials and tightly scoped session boundaries. That means the agent receives only the access needed for the specific job, for the shortest possible time, and the access is revoked when the task ends. Workload identity should be the identity primitive, with cryptographic proof of what the agent is, and policy-as-code deciding what it may do at that moment.

Define the permitted task in business terms, not just technical permissions.

Bind each browser session to a single workflow, data source, and expiry window.

Use short-lived secrets and revoke them automatically after completion.

Log the intent, the policy decision, and the exact page or tool used.

Require human review for high-impact actions such as submission, deletion, or payment.

This aligns with the direction described in the AI Agents: The New Attack Surface report and the OWASP Agentic AI Top 10, which both emphasize that agent behaviour is dynamic and can exceed the original design envelope. These controls tend to break down when one agent session is allowed to chain multiple tools across mixed-trust systems because the blast radius becomes harder to define and contain.

Common Variations and Edge Cases

Tighter browser controls often increase operational overhead, requiring organisations to balance safety against workflow speed and user experience. That tradeoff is especially visible in customer support, finance operations, and administrative automation, where a single workflow may legitimately cross several systems. Current guidance suggests using separate approval paths for read-only browsing, data entry, and state-changing actions, but there is no universal standard for this yet.

Edge cases appear when the agent acts correctly from a technical standpoint but incorrectly from a business standpoint. For example, it may select the right form, the right account, and the right login, yet still submit the wrong case because the instruction was ambiguous. The right response is not to blame the model in isolation; it is to inspect the policy, the prompt, the session scope, and the exception handling. NHIMG research on the AI LLM hijack breach and the LLMjacking: How Attackers Hijack AI Using Compromised NHIs shows how quickly valid access can become harmful when controls are too broad.

Where the guidance breaks down most often is in environments that reuse one agent identity across many workflows, because attribution becomes ambiguous and root cause analysis turns into guesswork.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agentic workflows fail when task scope and action boundaries are unclear.
CSA MAESTRO	M1	MAESTRO addresses threat modeling for autonomous agent workflows and approvals.
NIST AI RMF	GOVERN	AI RMF GOVERN covers accountability, oversight, and process ownership for AI systems.

Assign named owners for task intent, access approval, and incident response across the agent lifecycle.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Who is accountable when an AI agent completes a browser workflow incorrectly?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group