TL;DR: DeepMind’s AI Agent Traps taxonomy shows how perception, reasoning, memory, action, multi-agent, and overseer surfaces can be manipulated so agents act on hostile content, according to Pomerium’s analysis. The governance problem is that once an agent can reach tools, APIs, or data, prompt-level deception becomes an access-control problem, not just a model-safety issue.
NHIMG editorial — based on content published by Pomerium: When the web becomes the attacker, AI agent traps and the case for identity-aware access
Questions worth separating out
Q: How should security teams govern AI agents that can read web content and call tools?
A: Security teams should treat AI agents as first-class identities and authorize the action, not just the prompt.
Q: Why do AI agent traps create more risk than ordinary prompt injection?
A: They create more risk because the content is only the starting point.
Q: What do security teams get wrong about protecting AI agents from the web?
A: They often overfocus on blocking malicious text and underfocus on what the agent is allowed to do after it reads it.
Practitioner guidance
- Enforce per-request authorization for agent tools Require every AI agent request to be evaluated against identity, route, destination, and tool scope before the action executes.
- Separate read access from act access Allow an agent to observe public or retrieved content without granting it unrestricted write paths, data export paths, or internal API reach.
- Scope MCP and API credentials to the smallest useful route Issue short-lived, route-bound credentials for agent workflows and avoid long-lived bearer tokens that survive beyond the immediate task.
What's in the full article
Pomerium's full blog post covers the operational detail this post intentionally leaves for the source:
- A category-by-category walkthrough of the six AI Agent Traps families and how each one maps to a different failure mode.
- Concrete policy examples for MCP routes, including how request conditions can bind identity to tool access.
- Step-by-step analysis of the M365 Copilot exfiltration pattern and how a proxy-based control layer contains it.
- A deeper explanation of how Pomerium issues scoped JWTs and logs every allow-or-deny decision for incident response.
👉 Read Pomerium's analysis of AI agent traps and identity-aware access →
AI agent traps and identity-aware access: what changes now?
Explore further