Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

AI agent phishing failures: are your controls keeping up?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 4368
Topic starter  

TL;DR: Six of eight AI models still made critical security failures, even when their phishing detection improved sharply with a short security skill, and the worst baseline model averaged 20 critical failures per run, according to 1Password’s SCAM benchmark. The result shows that recognising a threat is not the same as avoiding it, which makes agent trust a governance problem, not just a model-quality issue.

NHIMG editorial — based on content published by 1Password: SCAM benchmark results on AI agent phishing and trust

By the numbers:

Questions worth separating out

Q: How should security teams govern AI agents that can access email and passwords?

A: Treat the agent as a credentialed non-human identity, not just a smarter assistant.

Q: Why do AI agents create risk even when they detect phishing correctly?

A: Because detection does not stop execution unless the surrounding workflow forces a stop.

Q: What do security teams get wrong about AI agent trust?

A: They often assume that a model that can explain a threat will also avoid it.

Practitioner guidance

  • Separate threat recognition from credential execution Require an explicit stop and human approval before any agent can use a vault, submit a form, or forward a message after it detects risk.
  • Constrain agent access to task-scoped credentials Issue credentials that only work for the minimum action set needed for the task, and revoke them once the interaction ends.
  • Log every secret touchpoint in the agent path Record when an agent reads, copies, forwards, or pastes a secret, along with the destination and triggering prompt.

What's in the full article

1Password's full post covers the operational detail this post intentionally leaves for the source:

  • The full SCAM benchmark scenario design, including how the simulated inbox, vault, and browser tools were chained together.
  • Per-model run results across baseline and security-skill conditions, including the full score spread and critical-failure counts.
  • The exact security skill guidance used to change agent behaviour, including the rules that reduced unsafe credential actions.
  • Scenario-by-scenario examples showing how the models handled lookalike domains, embedded secrets, and fake storefronts.

👉 Read 1Password's SCAM benchmark results on AI agent phishing and trust →

AI agent phishing failures: are your controls keeping up?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
Share: