TL;DR: DeepMind’s AI Agent Traps taxonomy shows how perception, reasoning, memory, action, multi-agent, and overseer surfaces can be manipulated so agents act on hostile content, according to Pomerium’s analysis. The governance problem is that once an agent can reach tools, APIs, or data, prompt-level deception becomes an access-control problem, not just a model-safety issue.
At a glance
What this is: This is Pomerium’s analysis of AI Agent Traps and the finding that web-delivered deception only becomes harmful when an agent can act through privileged tools and APIs.
Why it matters: It matters because IAM teams need to govern agent actions at the access layer, not assume model safety alone will contain prompt injection, exfiltration, or delegated misuse.
👉 Read Pomerium's analysis of AI agent traps and identity-aware access
Context
AI agent traps are adversarial web, content, and workflow manipulations that aim to mislead an agent into taking unsafe actions. The primary issue is not model compromise alone, but the point where an agent translates deception into real access against enterprise systems.
For identity teams, this shifts the control problem to the action layer. If an AI agent can read pages, call tools, and move data on behalf of a user, then policy needs to follow the request path, the destination, and the scoped identity behind the action.
Pomerium frames the issue as identity-aware access for AI agents, with the web becoming an attack surface for tool-using systems. That starting position is increasingly typical for enterprises experimenting with agentic workflows, especially where agents touch internal APIs, MCP servers, or sensitive data.
Key questions
Q: How should security teams govern AI agents that can read web content and call tools?
A: Security teams should treat AI agents as first-class identities and authorize the action, not just the prompt. That means per-request policy checks on destination, tool name, route, and context, plus short-lived credentials that cannot be reused outside the approved task. If the agent is manipulated, the access layer must still stop the harmful action.
Q: Why do AI agent traps create more risk than ordinary prompt injection?
A: They create more risk because the content is only the starting point. Damage happens when a deceived agent can convert hostile input into a tool call, file access, or data exfiltration through legitimate privileges. Prompt injection becomes an access-control problem the moment the agent owns an execution path into enterprise systems.
Q: What do security teams get wrong about protecting AI agents from the web?
A: They often overfocus on blocking malicious text and underfocus on what the agent is allowed to do after it reads it. Content scanning can reduce exposure, but it does not stop a privileged request from executing. Governance has to separate perception controls from action controls so a bad instruction cannot become a permitted operation.
Q: What is the difference between model safety and identity-aware access for AI agents?
A: Model safety tries to keep the system from following harmful instructions, while identity-aware access constrains what happens if instructions are followed anyway. The first is about influence. The second is about enforcement. For enterprise use, the access layer is the last reliable boundary when the model is already compromised by deceptive content.
Technical breakdown
Content injection traps and the perception layer
Content injection traps exploit the gap between what humans see and what agents ingest. Hidden HTML, off-viewport text, encoded payloads, and cloaked page variants can all enter an agent’s context even when they never appear obvious to a person. The technical point is that the browser or retrieval layer is not neutral once the output is machine-readable. The agent can be fed instructions that alter summaries, prompt-following, or downstream tool use without any visible compromise of the underlying site.
Practical implication: do not treat visible page review as a control for agent consumption; constrain what the agent is allowed to act on after ingestion.
Behavioural control traps and the action layer
Behavioural control traps are the point where deception becomes operational harm. The paper’s examples include embedded jailbreaks, data exfiltration prompts, and sub-agent spawning that redirect an orchestrator into unsafe execution paths. This is the stage where identity and privilege matter most, because the agent already has access to a tool, API, or endpoint. If the action channel is broadly trusted, the attacker does not need to defeat the model again; they only need the model to request a permitted but dangerous action.
Practical implication: require per-request authorization on agent tool calls so a compromised instruction cannot become an unrestricted action.
Why per-request policy matters for MCP and tool use
Model Context Protocol extends the problem because it creates a standardized path from agent intent to enterprise tools. That is useful for integration, but it also means the trust boundary has to sit between the agent and the target service, not inside the model. A policy engine that evaluates identity, route, and tool name can limit what the agent may call even after it has been manipulated. In practice, this turns the proxy or gateway into the enforcement point for AI agent access instead of trusting the model’s judgment.
Practical implication: bind MCP and API access to scoped, per-request policy decisions rather than persistent bearer credentials.
Threat narrative
Attacker objective: The attacker aims to convert a trusted agent’s legitimate access into unauthorized data exposure, unsafe execution, or delegated misuse of enterprise tools.
- Entry occurs when an attacker places adversarial content on a web page, document, email, or multimodal asset that an AI agent will ingest during a normal workflow.
- Credential access or abuse follows when the manipulated agent uses its legitimate privileges to call a tool, read data, or reach a downstream service it was authorized to use.
- Impact occurs when the agent exfiltrates data, performs an unsafe action, or propagates the malicious instruction through multi-agent workflows without human review.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
AI agent traps expose an identity failure, not just a model-safety failure: the decisive boundary is whether the agent can turn manipulated input into a privileged action. Once the agent can call tools, reach APIs, or move data, the question is no longer only whether the model was fooled. The question becomes whether the access layer still enforces identity, route, and scope at execution time. Practitioners should treat agent action control as the primary control plane.
Action-layer authorization is the named concept this category demands: the attack works when policy is checked too late, too broadly, or only at session start. The trap taxonomy is useful because it separates perception, reasoning, memory, action, multi-agent, and human-overseer failures. That separation matters for governance because only the action layer can reliably stop a deceived agent from completing a harmful request. Practitioners should re-centre control design around per-request enforcement.
Identity-aware access is the correct blast-radius model for agentic workflows: web content can still deceive, but deception should not automatically equal privilege. The governance assumption that a trusted session stays trusted across all downstream requests was built for simpler client-server patterns. It fails when an autonomous or semi-autonomous client can generate new requests from hostile input. Practitioners must rethink how trust is inherited across tool calls, not just how prompts are filtered.
Multi-agent systems multiply the accountability gap: once an orchestrator can spawn sub-agents or delegate work across contexts, a single hostile instruction can propagate through several identities. That creates a governance problem that looks like least privilege but behaves like recursive delegation. The field needs stronger identity boundaries between orchestrator, sub-agent, and service endpoint. Practitioners should assume the blast radius expands with each delegated hop unless policy explicitly contains it.
Named concept: action-layer containment: the practical lesson is to contain harm where the agent acts, not where the content is read. This does not eliminate adversarial content, but it prevents hostile content from becoming enterprise impact by default. That is the operational threshold identity teams need to own. Practitioners should evaluate every agent workflow against that containment boundary.
From our research:
- 92% of organisations expose NHIs to third parties, raising concerns about supply chain security, according to the Ultimate Guide to NHIs.
- Only 5.7% of organisations have full visibility into their service accounts, which means many teams cannot reliably trace which non-human identity can reach which downstream tool or system.
- That visibility gap makes the 52 NHI Breaches Analysis a useful next step for teams examining how access paths become breach paths.
What this signals
Action-layer containment: the next stage of agent governance is not better prompt hygiene, but narrower execution authority. As AI systems begin reading the web and driving workflows, security teams need to assume that some inputs will be deceptive and design controls that still fail safely at the request boundary.
With 92% of organisations exposing NHIs to third parties, per the Ultimate Guide to NHIs, the external surface area around agents is already large enough that delegation and supplier risk cannot be treated separately. The practical response is to map where agents inherit trust from upstream identities and where they must be re-verified before acting.
The programme signal is clear: agent security now sits between content governance, identity governance, and privileged access governance. Teams that keep those disciplines separate will miss the handoff where hostile content turns into authorised behaviour. Security architecture should make that handoff visible, reviewable, and denyable.
For practitioners
- Enforce per-request authorization for agent tools Require every AI agent request to be evaluated against identity, route, destination, and tool scope before the action executes. Do not rely on session-level approval for workflows that can read pages and then trigger internal actions.
- Separate read access from act access Allow an agent to observe public or retrieved content without granting it unrestricted write paths, data export paths, or internal API reach. The agent should not inherit broad privileges just because it can interpret input.
- Scope MCP and API credentials to the smallest useful route Issue short-lived, route-bound credentials for agent workflows and avoid long-lived bearer tokens that survive beyond the immediate task. The control goal is to prevent a manipulated agent from reusing a credential outside its intended destination.
- Log denied and allowed agent actions with full context Capture user, agent identity, route, tool name, destination, and policy decision so security teams can investigate prompt-driven misuse after the fact. Treat auditability as part of the containment design, not as an afterthought.
Key takeaways
- AI agent traps show that hostile web content becomes dangerous only when an agent can turn it into a permitted action.
- The main control gap is not model comprehension, but whether identity-aware access still governs tool calls, destinations, and data paths at execution time.
- Teams need per-request authorization, scoped credentials, and action-layer logging if they want agent workflows to fail safely.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agent traps and tool misuse map directly to agentic AI threat patterns. | |
| OWASP Non-Human Identity Top 10 | NHI-01 | AI agents act as non-human identities that need scoped access and accountability. |
| NIST Zero Trust (SP 800-207) | PR.AC-4 | Per-request authorization and continuous verification align with zero trust access. |
Treat each agent as a distinct NHI and bind it to least privilege, scope, and traceability.
Key terms
- AI Agent Traps: AI Agent Traps are adversarial content patterns designed to mislead an AI agent into unsafe behaviour. They exploit the difference between what a human perceives and what a machine reads, then rely on the agent’s privileges to turn deception into action. For governance, they are an access problem as much as a model problem.
- Action Layer: The action layer is the part of an agent workflow where a decision becomes an external request, tool call, or data movement. It is the point at which identity, authorization, and audit controls can still stop harm. For autonomous and semi-autonomous systems, this is the most important enforcement boundary.
- Identity-Aware Access: Identity-aware access is an authorization model that evaluates who or what is making a request, what it is trying to reach, and under what context. It replaces broad, persistent trust with request-level decisions. In agentic environments, it is the control that can contain a deceived agent before it reaches enterprise systems.
- Per-Request Authorization: Per-request authorization means every action is checked individually instead of trusting a session once and assuming all later activity is safe. This matters for agents because the request that causes damage may be generated after the initial interaction has already been approved. It is a stronger fit for dynamic, tool-using identities.
Deepen your knowledge
AI agent traps and identity-aware access are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are designing controls for agentic workflows that touch tools, APIs, or internal data, it is worth exploring.
This post draws on content published by Pomerium: When the web becomes the attacker, AI agent traps and the case for identity-aware access. Read the original.
Published by the NHIMG editorial team on 2026-04-22.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org