AI agent security needs architectural boundaries, not prompt tuning

By NHI Mgmt Group Editorial TeamDomain: Agentic AI & NHIsSource: Cyera

TL;DR: AI agents become materially harder to govern when private data access, untrusted content processing, and external action capabilities converge in the same runtime, according to Cyera Labs. Training and prompt engineering cannot reliably contain that configuration, so architectural boundaries become the decisive control plane for NHI governance.

At a glance

What this is: Cyera Labs argues that AI agents create a structural security problem when they can read sensitive data, process untrusted content, and take external actions in the same workflow.

Why it matters: For IAM and NHI practitioners, that combination turns agents into high-risk non-human identities that need runtime limits, not just policy intent.

👉 Read Cyera's analysis of AI agent architectural boundaries and the lethal trifecta

Context

AI agent security is not just a model safety problem. When an agent can access private data, ingest untrusted content, and act externally from the same session, the security boundary shifts from the model to the surrounding identity and access controls. That is the core NHI governance issue: the agent becomes a non-human identity with effective privilege that may exceed what teams can inspect or contain.

The article frames the problem as architectural rather than behavioral, which is the right starting point for practitioners. If the trust boundary is built into prompts alone, then a malicious instruction hidden in a normal business document can become an execution path. That is atypical for traditional app security but increasingly typical for agent deployments, especially when teams reuse human access patterns for autonomous software.

Key questions

Q: How should security teams govern AI agents that can read private data and take actions?

A: Treat the agent as a non-human identity with runtime constraints, not as a smarter user. Separate agent identities, limit permissions to the smallest effective scope, block high-risk outbound actions after sensitive data access, and require human approval for irreversible operations. Governance fails when access is static but the workflow is dynamic.

Q: Why do AI agents create more risk than traditional automation?

A: Traditional automation usually follows fixed rules, while AI agents can interpret untrusted content as instructions and then act with real permissions. That makes the attack path semantic rather than technical. The risk rises when the same session can read sensitive data, process external input, and send output without an independent enforcement layer.

Q: What is the difference between access control and data-flow control for agents?

A: Access control decides whether an agent may enter a system or use a tool. Data-flow control decides whether information obtained inside the session may leave that boundary. Both are necessary. An agent can be properly authorized and still create a breach if the architecture allows restricted data to reach outbound channels.

Q: When should organisations require human approval for AI agent actions?

A: Human approval is most appropriate when an agent crosses a trust boundary, such as sending external email after accessing confidential data, changing records in a production system, or triggering actions that cannot be easily reversed. If the consequence is hard to undo, the decision should not be fully automated.

Technical breakdown

What the lethal trifecta means for AI agent identity

The lethal trifecta is the convergence of three capabilities: access to private data, ingestion of untrusted content, and the ability to take external action. Each capability is useful on its own, but together they create a path where language becomes the attack vector. The agent cannot reliably distinguish user instruction from adversarial instruction when both arrive through the same context window. From an IAM perspective, the problem is not only authorization at login. It is runtime effective access, because the agent may combine permissions in ways that a human workflow never would.

Practical implication: Design agent identities around the smallest effective permission set and treat every data source as a potential command channel.

Why prompt engineering does not solve agentic AI security

Prompt engineering can shape behavior, but it cannot create a hard security boundary. A model trained to be helpful will still process a plausible instruction embedded in a meeting invite, ticket, or document if the instruction fits the task. That means the attack is not a malformed request but a semantically valid one. Security teams should treat this as a control failure, not a tuning problem. The deeper issue is that the agent’s policy and the attacker’s payload occupy the same linguistic layer, which makes deterministic prevention impossible without external enforcement.

Practical implication: Move high-risk controls out of the prompt and into identity, policy, and runtime enforcement layers.

Architectural boundaries for NHI governance

Hard boundaries are the only credible answer when autonomous software can read and write across trust zones. In practice, that means separate service identities for agents, session-based permission intersection, content flow restrictions, isolated execution environments, and human approval gates for sensitive actions. These are not interchangeable ideas. Identity limits who the agent can act as, data-flow rules limit what information can leave a boundary, isolation limits where compromise can spread, and approval gates limit irreversible consequences. Together, they turn agent security into an enforceable NHI control problem.

Practical implication: Map each agent workflow to a specific boundary type and refuse deployments that lack a runtime containment model.

Threat narrative

Attacker objective: The attacker wants the agent to leak internal information or take unsafe actions while remaining within normal business workflow channels.

Entry occurs when an attacker embeds instructions in ordinary business content such as a meeting invitation, support ticket, or uploaded document.
Escalation occurs when the agent processes that content, pulls sensitive internal data into context, and then uses its own authorized tools to prepare a response.
Impact occurs when the agent exfiltrates confidential material through legitimate outbound channels without triggering traditional security alerts.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

The lethal trifecta is a governance problem, not just a model-safety problem. Once an agent can read private data, consume untrusted content, and act externally, the organization has created a non-human identity with compound authority. That authority can be abused without malware, credential theft, or visible privilege escalation. Practitioners should treat the issue as runtime access governance, not model tuning.

Ephemeral trust debt is now a real NHI control gap. Teams often assume that short-lived agent sessions reduce risk enough to justify broader access. In reality, even brief sessions can combine enough permissions to exfiltrate data or trigger unsafe actions before controls react. The security debt is not just standing privilege, but standing trust in a session that was never designed for adversarial language.

Identity controls must govern effective access, not just assigned access. The right question is not what an agent is technically allowed to do in theory, but what it can do after user context, session policy, and tool permissions intersect. That is where conventional IAM implementations often over-grant. Practitioners should enforce intersection-based authorization for agents and treat any broader pattern as an architectural exception.

Data-flow enforcement is becoming a first-class NHI control. Agents blur the line between reading and sending, which makes outbound channels part of the trust boundary. If sensitive data can enter a session, the architecture must be able to stop that data from leaving through email, webhooks, or APIs. The practical conclusion is simple: if you cannot constrain data flow, you do not yet control the agent.

From our research:
96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate, according to AI Agents: The New Attack Surface report.
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to the same report.
For a broader control baseline, see the Ultimate Guide to NHIs for lifecycle, visibility, and least-privilege patterns that apply to autonomous agents.

What this signals

AI agent governance will shift from policy writing to containment design. The practical question is no longer whether an agent may access data, but whether that access can be contained when the agent is exposed to adversarial content. That pushes teams toward runtime enforcement, session isolation, and least-privilege agent identities rather than document-only policy programs. For practitioners, the next control maturity step is to make sensitive data and outbound action mutually exclusive inside agent workflows.

With 80% of organisations reporting that their AI agents have already acted beyond intended scope, per AI Agents: The New Attack Surface report, the boundary problem is already operational, not theoretical. That changes how IAM leads should prioritize review work. The highest value effort is not broad awareness training, but identifying where agents can both ingest and export data in the same session.

Identity blast radius: this is the practical concept teams should sharpen now, because agent compromise is no longer about one credential but about the combined effect of user access, tool scope, and session policy. The more those three layers overlap without runtime checks, the easier it becomes for one malicious prompt to turn into a breach path. Teams should review every agent workflow for this overlap before expanding deployment.

For practitioners

Separate agent identities from human users Create dedicated service identities for agents and calculate effective access as the intersection of user entitlements, agent scope, and session policy.
Block outbound action after sensitive data access Disable external communication tools for sessions that touch restricted documents, credentials, or other sensitive data classes.
Add human approval gates for high-risk actions Require review before an agent sends email, updates external systems, or posts to communication channels after handling confidential material.
Enforce isolated sessions and clean teardown Use per-session isolation, destroy session state on termination, and prevent cross-session reuse of context or cached data.
Map controls to OWASP agentic risks Align prompt injection, tool misuse, and identity abuse scenarios to the OWASP Agentic AI Top 10 and test each workflow against those failure modes.

Key takeaways

AI agents become security-relevant non-human identities the moment they can combine private data access, untrusted input, and external action.
The main failure mode is not malware, but trusted workflow execution that turns normal content into an exfiltration path.
Practitioners should prioritize runtime containment, separate agent identities, and human gates for high-risk actions before scaling deployment.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	ASI01	Prompt injection and goal hijack map directly to the article's attack path.
NIST AI RMF	GOVERN	Agent accountability and oversight are central to the article's boundary model.
NIST Zero Trust (SP 800-207)	PR.AC-4	Least-privilege and session-scoped access are core to limiting agent blast radius.

Test agent workflows for untrusted-content injection and block goal hijack paths at runtime.

Key terms

Lethal Trifecta: A risky agent architecture where private data access, untrusted content processing, and external action capability exist in the same runtime. The combination lets an attacker turn ordinary business content into an exfiltration path without needing malware or stolen credentials.
Effective Access: The real access an agent can use after user permissions, session policy, and tool scope are combined. This matters more than nominal entitlements because an agent's practical power is often broader or narrower than any single control plane suggests.
Data-flow Enforcement: Runtime controls that stop sensitive information from moving from a protected session to an external channel. In agent security, this is a containment function, not a policy document, and it must operate even when the model behaves as intended.

Deepen your knowledge

AI agent identity governance and runtime boundary design are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your programme is moving from pilot agents to production workflows, this course helps frame the controls you need next.

This post draws on content published by Cyera: The Lethal Trifecta and why AI agents require architectural boundaries. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org