Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

AI agent phishing failures: are your controls keeping up?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 9079
Topic starter  

TL;DR: Six of eight AI models still made critical security failures, even when their phishing detection improved sharply with a short security skill, and the worst baseline model averaged 20 critical failures per run, according to 1Password’s SCAM benchmark. The result shows that recognising a threat is not the same as avoiding it, which makes agent trust a governance problem, not just a model-quality issue.

NHIMG editorial — based on content published by 1Password: SCAM benchmark results on AI agent phishing and trust

By the numbers:

Questions worth separating out

Q: How should security teams govern AI agents that can access email and passwords?

A: Treat the agent as a credentialed non-human identity, not just a smarter assistant.

Q: Why do AI agents create risk even when they detect phishing correctly?

A: Because detection does not stop execution unless the surrounding workflow forces a stop.

Q: What do security teams get wrong about AI agent trust?

A: They often assume that a model that can explain a threat will also avoid it.

Practitioner guidance

  • Separate threat recognition from credential execution Require an explicit stop and human approval before any agent can use a vault, submit a form, or forward a message after it detects risk.
  • Constrain agent access to task-scoped credentials Issue credentials that only work for the minimum action set needed for the task, and revoke them once the interaction ends.
  • Log every secret touchpoint in the agent path Record when an agent reads, copies, forwards, or pastes a secret, along with the destination and triggering prompt.

What's in the full article

1Password's full post covers the operational detail this post intentionally leaves for the source:

  • The full SCAM benchmark scenario design, including how the simulated inbox, vault, and browser tools were chained together.
  • Per-model run results across baseline and security-skill conditions, including the full score spread and critical-failure counts.
  • The exact security skill guidance used to change agent behaviour, including the rules that reduced unsafe credential actions.
  • Scenario-by-scenario examples showing how the models handled lookalike domains, embedded secrets, and fake storefronts.

👉 Read 1Password's SCAM benchmark results on AI agent phishing and trust →

AI agent phishing failures: are your controls keeping up?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 2 months ago
Posts: 8508
 

Agent trust fails at the point of action, not the point of detection. The article shows that an AI system can correctly identify phishing and still complete the harmful transaction. That means existing controls that assume recognition leads to safe refusal are structurally weak when the actor can both judge and act in the same session. The practitioner conclusion is that identity governance must control the handoff from detection to execution, not just improve model awareness.

A few things that frame the scale:

  • The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
  • Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control.

A question worth separating out:

Q: Should organisations use security skill prompts instead of access controls for AI agents?

A: No. Security skills can improve behaviour, but they are not a substitute for identity governance. Access rights, credential scope, and approval gates define what an agent is allowed to do, while prompts only influence how often it chooses safely. Use both, but never treat prompting as control.

👉 Read our full editorial: AI agent trust breaks when models can spot phishing but still click



   
ReplyQuote
Share: