TL;DR: Six of eight AI models still made critical security failures, even when their phishing detection improved sharply with a short security skill, and the worst baseline model averaged 20 critical failures per run, according to 1Password’s SCAM benchmark. The result shows that recognising a threat is not the same as avoiding it, which makes agent trust a governance problem, not just a model-quality issue.
NHIMG editorial — based on content published by 1Password: SCAM benchmark results on AI agent phishing and trust
By the numbers:
- 61% of Americans have fallen victim to phishing attacks.
- When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes, and as quickly as 9 minutes in some cases.
Questions worth separating out
Q: How should security teams govern AI agents that can access email and passwords?
A: Treat the agent as a credentialed non-human identity, not just a smarter assistant.
Q: Why do AI agents create risk even when they detect phishing correctly?
A: Because detection does not stop execution unless the surrounding workflow forces a stop.
Q: What do security teams get wrong about AI agent trust?
A: They often assume that a model that can explain a threat will also avoid it.
Practitioner guidance
- Separate threat recognition from credential execution Require an explicit stop and human approval before any agent can use a vault, submit a form, or forward a message after it detects risk.
- Constrain agent access to task-scoped credentials Issue credentials that only work for the minimum action set needed for the task, and revoke them once the interaction ends.
- Log every secret touchpoint in the agent path Record when an agent reads, copies, forwards, or pastes a secret, along with the destination and triggering prompt.
What's in the full article
1Password's full post covers the operational detail this post intentionally leaves for the source:
- The full SCAM benchmark scenario design, including how the simulated inbox, vault, and browser tools were chained together.
- Per-model run results across baseline and security-skill conditions, including the full score spread and critical-failure counts.
- The exact security skill guidance used to change agent behaviour, including the rules that reduced unsafe credential actions.
- Scenario-by-scenario examples showing how the models handled lookalike domains, embedded secrets, and fake storefronts.
👉 Read 1Password's SCAM benchmark results on AI agent phishing and trust →
AI agent phishing failures: are your controls keeping up?
Explore further