Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Agentic AI hard boundaries: are your controls actually enforceable?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 2364
Topic starter  

TL;DR: Agentic AI browsers and copilots remain vulnerable to prompt injection because probabilistic guardrails cannot reliably separate trusted intent from malicious instructions, according to Zenity’s PerplexedComet analysis. The real security boundary is deterministic enforcement at the code, network, or OS layer, where the model never gets a vote.

NHIMG editorial — based on content published by Zenity: Why Soft Guardrails Get Us Hacked: The Case for Hard Boundaries in Agentic AI

Questions worth separating out

Q: How should security teams prevent prompt injection in agentic AI systems?

A: Security teams should prevent prompt injection by removing dangerous capabilities at the environment level, not by relying on the model to judge intent correctly.

Q: Why do soft guardrails fail in agentic AI security?

A: Soft guardrails fail because they are probabilistic and operate in the same reasoning space as the agent they supervise.

Q: What breaks when an agent can reach local files and network egress?

A: What breaks is the assumption that the model can safely decide which actions belong to the task.

Practitioner guidance

  • Enforce deterministic capability blocks Remove high-risk functions such as local file access, clipboard access, and arbitrary egress from agent runtimes at the code or policy layer.
  • Separate trusted work from untrusted content paths Route external content, such as calendar descriptions, web pages, and inbox items, through a distinct ingestion path that cannot directly trigger privileged agent actions.
  • Map agent privileges as enforceable identity boundaries Inventory which resources each agent can reach, then reduce those permissions to the minimum set needed for the task.

What's in the full article

Zenity's full blog post covers the operational detail this post intentionally leaves for the source:

  • The step-by-step PerplexedComet attack chain, including the calendar-invite entry vector and the file:// exfiltration path.
  • The remediation sequence showing how Perplexity converted a prompt-level weakness into a deterministic code-level boundary.
  • The bypass variant involving view-source:file:// and why the first patch did not fully close the edge case.
  • The research context around soft guardrails, hard boundaries, and the broader agentic AI security debate.

👉 Read Zenity's analysis of why soft guardrails fail against agentic AI attacks →

Agentic AI hard boundaries: are your controls actually enforceable?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 4 weeks ago
Posts: 924
 

Soft guardrails are a detection layer, not a security boundary. Probabilistic controls can add friction and visibility, but they cannot reliably prevent an agent from acting on malicious instructions embedded in untrusted content. The PerplexedComet chain shows that when the model is allowed to arbitrate trust inside the prompt, the attacker and the user can be merged into a single execution plan. Practitioners should read that as a boundary failure, not a tuning issue.

A few things that frame the scale:

  • 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
  • Only 44% of organisations have implemented any policies to govern AI agents, even though 92% say governance is critical to enterprise security.

A question worth separating out:

Q: What should teams do when an agentic browser must handle untrusted content?

A: Teams should isolate untrusted content handling from privileged actions and require deterministic barriers before the agent can touch sensitive resources. If the browser can read, interpret, and act on hostile text in the same session, then the trust boundary is too weak for production use.

👉 Read our full editorial: Hard boundaries, not soft guardrails, define agentic AI security



   
ReplyQuote
Share: