BrowseSafe prompt injection shows runtime guardrails still fail

By NHI Mgmt Group Editorial TeamPublished 2026-04-16Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: BrowseSafe could be bypassed in 36% of red-team attempts, with encoding and HTML-based obfuscation defeating a model intended to secure AI browsers against prompt injection, according to Lasso Security. The result shows why continuous testing and runtime enforcement still matter when agent guardrails are expected to sit between hostile input and browser actions.

At a glance

What this is: Lasso Security’s test of Perplexity’s BrowseSafe shows that prompt injection defenses can still be bypassed by simple encoding and HTML obfuscation.

Why it matters: For IAM teams, the lesson is that AI browser guardrails behave like non-human identity controls and need lifecycle testing, runtime enforcement, and auditability rather than trust in a single model layer.

👉 Read Lasso Security's BrowseSafe prompt injection test and bypass findings

Context

Prompt injection is a control-break problem, not just a content-filtering problem. If an AI browser or agent can be steered by hidden instructions embedded in web content, then the security model must account for how untrusted input reaches tool use, action selection, and browser execution.

This matters to identity programmes because AI browsers and agents operate as non-human identities with real access paths. Once those paths can be influenced at runtime, the question shifts from whether the model can classify text to whether the surrounding governance stack can constrain behaviour, validate actions, and prove what happened.

Lasso Security’s BrowseSafe test focuses that issue on the browser-agent boundary. The article is a hands-on evaluation rather than a vendor benchmark review, and its core finding is that “safe-by-model” assumptions remain fragile when encoding and formatting are enough to reshape the attack surface.

Key questions

Q: How should security teams test AI browser agents for prompt injection risk?

A: They should test AI browser agents with the same transformations attackers use in practice, including encoding, HTML wrapping, and mixed-format payloads. Clean prompts are not enough. Security teams also need adversarial red-teaming, policy checks before execution, and logging that shows whether a malicious instruction reached a browser action.

Q: Why do browser-based AI agents create more prompt injection risk than plain chatbots?

A: Browser-based AI agents create more risk because they do more than generate text. They interpret web content, decide what it means, and can turn that interpretation into actions such as clicking or submitting forms. That makes prompt injection an execution problem, not just a language problem, and increases the need for action-level controls.

Q: What breaks when prompt injection guardrails only look for obvious malicious text?

A: Guardrails fail when they depend on obvious wording because attackers can hide instructions inside encoding, HTML, or formatting that changes how the model reads the page. The result is a false safe decision. Teams need to assume the attacker will alter representation, not just content, and design testing around that reality.

Q: How do organisations stop a model’s safe response from becoming unsafe execution?

A: They stop it by separating detection from permission. A model that labels content as safe should not automatically authorize browser actions. Organisations need policy enforcement, action validation, and event auditing so the system can deny execution even when the model output appears normal.

Technical breakdown

Prompt injection in browser agents

Prompt injection happens when attacker-controlled text alters how an AI system interprets instructions, often by hiding malicious intent inside ordinary-looking content. In browser agents, the risk is sharper because the model reads pages, parses HTML, and may then pass derived instructions into tool use or action planning. That means the attack is not only about model output, but about how browsing context is transformed into execution context. If the model cannot reliably separate user intent, page content, and policy constraints, malicious instructions can influence the next action even when the surrounding application appears normal.

Practical implication: validate how untrusted web content is isolated before it reaches tool execution.

Encoding and HTML obfuscation as bypass techniques

Encoding and HTML wrapping are classic obfuscation methods that change how malicious text is represented without changing the attacker’s intent. For a browser-focused model, this is especially relevant because the input is already messy, nested, and semi-structured. A detector that relies too heavily on surface patterns can miss instructions once they are encoded, split across tags, or disguised as harmless page structure. The failure mode is not that the model sees nothing, but that it sees the wrong thing and assigns a false sense of safety. That is why benchmark performance on plain text often overstates real resilience in browser contexts.

Practical implication: test against encoded and HTML-embedded payloads, not just plain-language jailbreak prompts.

Runtime enforcement for AI browser action safety

Runtime enforcement sits between detection and execution. Instead of assuming a model will correctly block every malicious instruction, it checks the actual action that the agent is about to take, such as opening a page, clicking, or filling a form. This is the difference between content classification and behavioural control. In agent environments, that layer matters because the harmful event is often the action itself, not merely the presence of bad text. A browser-agent architecture that lacks action validation, event logging, and policy enforcement will always depend too heavily on model judgement alone.

Practical implication: require action-level policy checks and auditing before browser commands are executed.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Safe-by-model is a broken assumption for AI browser governance. BrowseSafe’s failure shows that a single detector cannot carry the burden of trust when the input is adversarial and the output can trigger real browser actions. The assumption that a model can reliably identify every malicious instruction before execution collapses as soon as encoding and structure become part of the attack surface. Practitioners should treat model-based filtering as one layer in a control stack, not as the control stack itself.

Prompt injection in browser agents is an identity problem because the agent is the actor. Once a browser agent can click, submit, and navigate on behalf of a user or workflow, it behaves like a non-human identity with delegated authority. That makes OWASP-NHI and zero-trust thinking more relevant than a simple content moderation frame. The governance question is not whether text looks malicious. It is whether an identity with delegated execution rights can be steered into unsafe action by untrusted context.

Encoding-aware testing should be treated as a named concept: obfuscation-bypass exposure. The article shows that hidden instructions can survive simple transformations that were supposed to make them easier to detect. That means benchmark scores built on clean prompts can miss the operational failure mode entirely. The practical implication is that security teams must evaluate agent guardrails against the same input normalization paths that real attackers use, or they will certify a control that has not been tested against the actual threat form.

Browser-agent security now needs continuous governance, not release-time assurance. The article’s shift-left and shift-right recommendations point to the right direction, but the deeper issue is governance timing. If an AI system can be updated, re-prompted, and redirected continuously, then a one-time approval is insufficient. NIST CSF and OWASP-NHI both point toward ongoing identification, protection, detection, and response rather than static trust in a model release.

What fails here is the boundary between detection and authority. BrowseSafe can flag some hostile content, but the control still fails if the surrounding system treats that classification as enough to authorize the next step. In identity terms, this is a policy binding problem: detection does not equal authorisation. Practitioners need to separate what the model thinks from what the agent is allowed to do.

From our research:
70% of organisations grant AI systems more access than they would give a human employee performing the exact same job, according to The 2026 Infrastructure Identity Survey.
Only 44% of organisations have implemented any policies to manage their AI agents, despite 92% agreeing that governing AI agents is critical to enterprise security.
See also OWASP Agentic AI Top 10 for the attack patterns that turn model weakness into operational risk.

What this signals

Obfuscation-bypass exposure is becoming the real test of AI browser governance: if a control only works on clean prompts, it is not yet a production control. With 7% of security leaders saying they do not know how often their AI systems are making autonomous changes to infrastructure, the programme risk is not just missed attacks, but invisible authority drift.

The operational shift is toward runtime policy, not static approval. Teams should expect browser-agent governance to sit alongside NHI controls, because delegated access paths now behave like machine identities with a mutable instruction stream. That makes action auditing, tool-use validation, and event-level traceability the practical centre of gravity.

For readers building control roadmaps, the useful comparison is not between good and bad prompts. It is between systems that merely detect risky content and systems that can still stop an unsafe browser action after detection. That distinction will determine whether your AI governance programme scales beyond the lab.

For practitioners

Red-team AI browser controls before deployment Test guardrails against plain text, encoded payloads, and HTML-embedded instructions before they are allowed into production workflows. Use adversarial evaluation to prove what the model misses, not just what it catches.
Add runtime action validation for browser agents Block or confirm high-risk browser actions such as open, click, form-fill, and submission when they arise from untrusted context or abnormal instruction patterns. Log each decision at the event level so review is possible after the fact.
Separate content detection from authorisation Treat a safe classification as an input to policy, not as permission to continue. Bind agent actions to explicit controls that can deny execution even when the model output looks acceptable.
Expand coverage to input normalisation paths Include the transformations attackers actually use, especially encoding, tag wrapping, and formatting tricks that change how the model reads page content. This closes the gap between benchmark prompts and real browser traffic.

Key takeaways

BrowseSafe’s bypasses show that prompt injection remains a control problem, not just a model-quality problem.
Encoding and HTML obfuscation can defeat controls that look effective in cleaner benchmark conditions.
AI browser governance now requires runtime action checks, policy binding, and adversarial testing before production use.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Prompt injection and tool misuse are central to this browser-agent test.
OWASP Non-Human Identity Top 10	NHI-01	The agent acts as a non-human identity with delegated access and execution rights.
NIST CSF 2.0	PR.AC-4	Delegated access and authorization boundaries are the core control issue here.

Evaluate browser-agent prompts, tools, and action boundaries against agentic attack patterns before release.

Key terms

Prompt Injection: Prompt injection is a technique that embeds instructions into content so an AI system follows the attacker’s intent instead of the user’s. In browser and agent settings, the danger is that hidden instructions can shape tool use, action selection, or downstream decisions without obvious signs to the operator.
Browser Agent: A browser agent is an AI-driven system that reads web content and can take actions such as clicking, filling forms, or navigating pages. It becomes an identity governance concern because it can act on delegated authority, making its permissions, audit trail, and runtime boundaries security-critical.
Runtime Enforcement: Runtime enforcement is the control layer that evaluates and blocks actions as they are about to execute, rather than relying only on pre-release testing. For AI systems, it separates detection from authority so a model’s judgment does not automatically become permission to act.

Deepen your knowledge

Prompt injection, browser-agent control, and runtime policy design are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance for AI systems that can act in the browser, it is worth exploring.

This post draws on content published by Lasso Security: BrowseSafe prompt injection risks in Perplexity's open-source model. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-16.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org