Notifications

Clear all

AI agent phishing failures: are your controls keeping up?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12324

Topic starter 10/06/2026 11:41 pm

TL;DR: Six of eight AI models still made critical security failures, even when their phishing detection improved sharply with a short security skill, and the worst baseline model averaged 20 critical failures per run, according to 1Password’s SCAM benchmark. The result shows that recognising a threat is not the same as avoiding it, which makes agent trust a governance problem, not just a model-quality issue.

NHIMG editorial — based on content published by 1Password: SCAM benchmark results on AI agent phishing and trust

By the numbers:

61% of Americans have fallen victim to phishing attacks.
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes, and as quickly as 9 minutes in some cases.

Questions worth separating out

Q: How should security teams govern AI agents that can access email and passwords?

A: Treat the agent as a credentialed non-human identity, not just a smarter assistant.

Q: Why do AI agents create risk even when they detect phishing correctly?

A: Because detection does not stop execution unless the surrounding workflow forces a stop.

Q: What do security teams get wrong about AI agent trust?

A: They often assume that a model that can explain a threat will also avoid it.

Practitioner guidance

Separate threat recognition from credential execution Require an explicit stop and human approval before any agent can use a vault, submit a form, or forward a message after it detects risk.
Constrain agent access to task-scoped credentials Issue credentials that only work for the minimum action set needed for the task, and revoke them once the interaction ends.
Log every secret touchpoint in the agent path Record when an agent reads, copies, forwards, or pastes a secret, along with the destination and triggering prompt.

What's in the full article

1Password's full post covers the operational detail this post intentionally leaves for the source:

The full SCAM benchmark scenario design, including how the simulated inbox, vault, and browser tools were chained together.
Per-model run results across baseline and security-skill conditions, including the full score spread and critical-failure counts.
The exact security skill guidance used to change agent behaviour, including the rules that reduced unsafe credential actions.
Scenario-by-scenario examples showing how the models handled lookalike domains, embedded secrets, and fake storefronts.

👉 Read 1Password's SCAM benchmark results on AI agent phishing and trust →

AI agent phishing failures: are your controls keeping up?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11878

12/06/2026 6:00 am

Agent trust fails at the point of action, not the point of detection. The article shows that an AI system can correctly identify phishing and still complete the harmful transaction. That means existing controls that assume recognition leads to safe refusal are structurally weak when the actor can both judge and act in the same session. The practitioner conclusion is that identity governance must control the handoff from detection to execution, not just improve model awareness.

A few things that frame the scale:

The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control.

A question worth separating out:

Q: Should organisations use security skill prompts instead of access controls for AI agents?

A: No. Security skills can improve behaviour, but they are not a substitute for identity governance. Access rights, credential scope, and approval gates define what an agent is allowed to do, while prompts only influence how often it chooses safely. Use both, but never treat prompting as control.

👉 Read our full editorial: AI agent trust breaks when models can spot phishing but still click

ReplyQuote

Forum Statistics

11 Forums

13.6 K Topics

26 K Posts

16 Online

135 Members

Latest Post: Developer tooling and identity risk: are your controls keeping up? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies