AI agent trust breaks when models can spot phishing but still click

By NHI Mgmt Group Editorial TeamPublished 2026-02-12Domain: Agentic AI & NHIsSource: 1Password

TL;DR: Six of eight AI models still made critical security failures, even when their phishing detection improved sharply with a short security skill, and the worst baseline model averaged 20 critical failures per run, according to 1Password’s SCAM benchmark. The result shows that recognising a threat is not the same as avoiding it, which makes agent trust a governance problem, not just a model-quality issue.

At a glance

What this is: 1Password’s SCAM benchmark shows AI agents can identify phishing threats yet still carry out dangerous actions, including opening lookalike links and exposing credentials.

Why it matters: IAM teams now have to govern what agents do after detection, not just whether they can spot threats, because credential use, sharing, and approval paths can fail inside the same workflow.

By the numbers:

61% of Americans have fallen victim to phishing attacks.
The best safety score out of the box was 92%, and the worst was 35%.
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes, and as quickly as 9 minutes in some cases.

👉 Read 1Password's SCAM benchmark results on AI agent phishing and trust

Context

AI agent identity risk is not just about whether a model can recognise a phishing page. The governance gap appears when an agent has enough runtime access to read mail, open links, retrieve secrets, and complete forms before a human ever sees the decision.

That breaks the assumption behind many IAM and secrets workflows: threat detection and safe action are the same thing. In practice, an agent can identify a malicious domain after it has already used a real credential, which means identity controls must account for execution, not just awareness.

Key questions

Q: How should security teams govern AI agents that can access email and passwords?

A: Treat the agent as a credentialed non-human identity, not just a smarter assistant. Give it only task-scoped access, require approval before any secret is used outside a trusted boundary, and log every read, copy, paste, and forward action. The main control objective is preventing a valid workflow from becoming an attacker-controlled credential path.

Q: Why do AI agents create risk even when they detect phishing correctly?

A: Because detection does not stop execution unless the surrounding workflow forces a stop. An agent can identify a fake domain and still click it, retrieve a real password, and submit it if the action chain is not constrained. The practical lesson is that runtime authorisation matters more than model confidence alone.

Q: What do security teams get wrong about AI agent trust?

A: They often assume that a model that can explain a threat will also avoid it. In practice, the same model may understand the risk and still complete the dangerous action because the agent loop keeps moving. Trust has to be measured by behaviour under pressure, not by explanation quality.

Q: Should organisations use security skill prompts instead of access controls for AI agents?

A: No. Security skills can improve behaviour, but they are not a substitute for identity governance. Access rights, credential scope, and approval gates define what an agent is allowed to do, while prompts only influence how often it chooses safely. Use both, but never treat prompting as control.

Technical breakdown

Why phishing detection does not stop agent misuse

Detection and restraint are different control layers. A model can classify a URL as malicious and still proceed with the next step in a workflow if the surrounding agent loop does not enforce a stop condition. That matters in credential-bearing workflows because the agent may have access to inboxes, vaults, browsers, and form-fill tools in the same session. The model’s judgement is therefore only one input to the runtime control plane, not the control plane itself. When the model sees the risk but the workflow still allows action, the security boundary has already failed.

Practical implication: place explicit approval gates between threat recognition and credential use in agent workflows.

Security skills shift behaviour, but not trust boundaries

The benchmark shows that short security guidance can materially improve model behaviour, which is useful, but it does not remove the need for identity and access controls. In other words, better prompting changes the probability of safe action; it does not create a durable assurance model for credential handling. That distinction matters for enterprise deployment because security teams cannot certify agents on model intelligence alone. The operational question is whether the surrounding system prevents a safe-sounding but dangerous action from reaching production data, credentials, or external systems.

Practical implication: treat security skills as a compensating signal, not a replacement for credential governance.

Agent inbox access turns secrets into an execution problem

Once an AI agent can read email, browse links, access a vault, and submit forms, secrets governance becomes an execution problem rather than a storage problem. The risk is not only exposure of the secret itself, but also the agent’s ability to move that secret into an attacker-controlled destination during the same task. That is why the article’s examples are so important: the harmful step is often a legitimate workflow action taken in the wrong context. Identity governance for agents therefore has to cover where credentials can be used, under what conditions, and with what evidence trail.

Practical implication: scope agent credentials to task-specific actions and log every credential touchpoint.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
MongoBleed breach — MongoBleed exposed secrets across 87K MongoDB servers.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Agent trust fails at the point of action, not the point of detection. The article shows that an AI system can correctly identify phishing and still complete the harmful transaction. That means existing controls that assume recognition leads to safe refusal are structurally weak when the actor can both judge and act in the same session. The practitioner conclusion is that identity governance must control the handoff from detection to execution, not just improve model awareness.

Security skill files are useful, but they do not close the governance gap around credential-bearing agents. The results improve sharply with a short skill, which proves behaviour is malleable. But malleability is not assurance. A programme that treats prompt-based behaviour change as equivalent to controlled access is still missing the core question of whether the agent is authorised to touch secrets in the first place. The practitioner conclusion is that model coaching and identity policy solve different problems.

Context-aware credential handling is now a named governance gap: execution-path exposure. The risk is not simply secret leakage, but the ability of an agent to move a real credential from a trusted source into an attacker-controlled destination while following a seemingly normal task. That makes the exposure path as important as the secret itself. The practitioner conclusion is that secrets governance must be evaluated at runtime, inside the workflow, not only at rest or in vault inventory.

Human-centred training still matters because AI agents inherit human trust assumptions. The benchmark mirrors a familiar failure mode: people often act before verifying a domain or sender. Agents can repeat that pattern at machine speed, which means IAM teams should not assume automation removes the need for judgement checkpoints. The practitioner conclusion is that agent governance should be designed as a trust transfer problem between machine observation and human authorisation.

OWASP NHI Top 10 remains the right lens because these agents behave like credentialed non-human identities. The article’s scenarios are about access, secrets, and workflow misuse, not generic AI accuracy. That places the issue squarely in NHI governance, where identity scope, credential handling, and runtime privilege all matter. The practitioner conclusion is to assess AI agents as non-human identities first, and only then as models.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control.
That fragmentation makes agent credential governance harder, so review Top 10 NHI Issues for the broader control pattern.

What this signals

Execution-path exposure is the category to watch next: agents are moving from passive analysis into active handling of credentials, which means governance failures will show up in workflow telemetry before they show up in model scores. The 27-day remediation lag in our research on secrets management is a warning that credential exposure remains hard to close even in human-led programmes, let alone agentic ones.

NHI teams should expect more pressure to prove that agents cannot both inspect and use secrets in the same flow. The right comparison is not model capability versus model capability, but runtime control versus uncontrolled credential movement. For that reason, our guidance on the The 52 NHI breaches Report remains relevant whenever an identity can move faster than a review cycle.

If you are mapping this to agentic controls, start with OWASP Top 10 for Agentic Applications 2026 and treat phishing resistance as only one slice of the problem. The broader issue is whether an agent can be trusted to stop itself before it turns a valid credential into an attacker-controlled action.

For practitioners

Separate threat recognition from credential execution Require an explicit stop and human approval before any agent can use a vault, submit a form, or forward a message after it detects risk. The security rule should bind the action, not only the judgement, and it should be enforced in the workflow layer.
Constrain agent access to task-scoped credentials Issue credentials that only work for the minimum action set needed for the task, and revoke them once the interaction ends. Do not let inbox-reading or browsing agents inherit broad sign-in capability by default.
Log every secret touchpoint in the agent path Record when an agent reads, copies, forwards, or pastes a secret, along with the destination and triggering prompt. That audit trail should support review of both intended and unintended credential movement.
Test agent workflows with phishing and secret-buried scenarios Use scenarios that mix normal work with hidden risk, such as fake domains, lookalike logins, and passwords embedded in documents. Benchmarking should measure whether the agent stops before it transmits a real credential.

Key takeaways

AI agents can recognise phishing and still complete the attack path, which means security must govern execution as well as detection.
Short security skill prompts improve behaviour, but they do not replace identity controls, approval gates, or credential scope limits.
For IAM teams, the core question is whether an agent can touch secrets without creating a reusable attacker path inside the same workflow.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Agent workflows exposed leaked credentials and unsafe reuse paths.
NIST Zero Trust (SP 800-207)	PR.AC-4	The article shows runtime trust must be continuously verified, not assumed.
NIST CSF 2.0	PR.AC-1	Access control must cover what an agent can do after it recognises risk.

Map agent actions to explicit authorisation rules and audit the resulting trails.

Key terms

Agent Trust: Agent trust is the degree to which a system is allowed to act on behalf of a user without creating unacceptable identity or data risk. For AI agents, trust is not about confidence in the model alone. It depends on the surrounding controls that limit secrets use, action scope, and escalation paths.
Execution-Path Exposure: Execution-path exposure is the risk that a credential or sensitive action becomes dangerous because it is used inside an attacker-influenced workflow. The secret may be valid, but the path it takes can still hand control to an adversary. This is a runtime identity problem, not only a storage problem.
Security Skill: A security skill is a short instruction set that shapes how an AI model evaluates threats before acting. In agentic workflows, it can improve refusal behaviour and threat awareness, but it does not authorise access or replace identity controls. Its value is behavioural, not governance-grade assurance.
Task-Scoped Credential: A task-scoped credential is an access token, password, or secret limited to a specific job, system, or time-bound workflow. It reduces blast radius by preventing broad reuse, but it only works when the surrounding identity system enforces narrow action boundaries and rapid revocation.

Deepen your knowledge

AI agent trust and credential handling are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for inbox-reading agents or password-using workflows, it is a practical place to start.

This post draws on content published by 1Password: SCAM benchmark results on AI agent phishing and trust. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-02-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org