AI phishing training is shifting from completion to behavior

By NHI Mgmt Group Editorial TeamPublished 2026-04-29Domain: Governance & RiskSource: Abnormal AI

TL;DR: AI-generated phishing now mirrors real roles and workflows, driving a 44% engagement rate with vendor email compromise attacks across 1,400 organisations while almost no one reports them, according to Abnormal AI. Static awareness programmes that track completion and clicks are no longer a meaningful proxy for resilience, because the real measure is behaviour under attack.

At a glance

What this is: This is an analysis of why AI-generated phishing is bypassing traditional awareness training and what behaviour-based coaching changes.

Why it matters: It matters because identity and access programmes depend on human decisions at the edge, and proxy metrics can leave CISOs blind to real social-engineering risk across human, NHI, and autonomous workflows.

By the numbers:

Research across 1,400 organizations found a 44% employee engagement rate with vendor email compromise attacks, yet almost no one reports them.

👉 Read Abnormal AI's analysis of AI phishing coaching and behaviour-based awareness

Context

AI-generated phishing is no longer easy to spot by typo, odd grammar, or suspicious links. The problem for identity security teams is that these attacks now imitate ordinary work, which means the usual awareness model measures participation, not actual resistance to deception.

For IAM, IGA, and security leaders, the governance issue is not just user education. Human identity programmes still rely heavily on static training cycles and proxy metrics, while attackers exploit timing, context, and routine business language to trigger split-second decisions that bypass technical controls.

Key questions

Q: How should security teams measure whether phishing awareness is actually working?

A: Track whether people report suspicious messages, slow down risky actions, and avoid credential entry under realistic simulations. Completion rates and annual quiz scores only prove that employees participated. The useful measures are behavioural: reporting speed, repeat susceptibility, and whether high-risk groups show sustained improvement after coaching.

Q: Why do AI-generated phishing attacks defeat traditional awareness training?

A: They remove the visual clues that old training relied on, such as spelling errors, odd domains, and obviously fake urgency. When a message mirrors a real role, vendor, or workflow, employees are less likely to question it. That makes context, not just content, the deciding factor in whether the attack succeeds.

Q: What do security teams get wrong about phishing simulation metrics?

A: They confuse participation with resilience. A user can finish training, click through a simulation, and still be unprepared for a real attack. The better question is whether the programme changes live decision-making, especially for employees who handle payments, credentials, or executive requests.

Q: Who should be accountable for AI-driven awareness programmes?

A: The security team remains accountable even when automation handles simulation delivery and adaptive coaching. Delegating the workflow does not delegate responsibility for policy, data use, scenario quality, or escalation outcomes. Accountability should sit with the identity, security, or human-risk owner who can explain programme decisions.

Technical breakdown

Why AI-generated phishing is harder to classify

Traditional phishing detection depended on visible defects such as misspellings, strange domains, and obvious urgency cues. AI-generated phishing removes those signals by using role-specific language, plausible request patterns, and business context that look legitimate to the recipient. That changes the defender’s problem from spotting malformed content to recognising behavioural deviation from normal communication patterns. In practice, this makes the mailbox, chat thread, or SMS line part of the identity attack surface, because the attack succeeds only when the human identity authorises the next step.

Practical implication: security teams need controls that measure response quality and reporting behaviour, not just message detection rates.

Behaviour-based awareness versus completion metrics

Completion rates, click rates, and annual training attendance are activity measures, not security outcomes. They tell you whether a person finished a module or interacted with a simulation, but not whether they will pause, verify, and report during a live attack. Behaviour-based awareness focuses on the observable decisions that matter, such as whether someone escalates a suspicious request, resists credential entry, or slows the attack chain before compromise spreads. This is closer to identity governance than traditional education, because it treats human action as an operational control surface.

Practical implication: replace vanity metrics with measures tied to reporting speed, risky action rates, and repeat susceptibility.

Agentic AI coaching loops in human risk management

The article’s most operational point is that AI can manage the awareness lifecycle more continuously than quarterly simulations and annual refreshers. An agentic coaching loop creates, delivers, and adapts simulations based on observed behaviour, then gives immediate feedback at the moment of failure. In autonomy terms, this is closer to a delegated workflow than a static campaign, because the system can decide which scenario to present next and adjust training based on the response. That matters because it turns awareness from a calendar event into a dynamic control process.

Practical implication: if AI is used to scale awareness, governance must cover scenario selection, feedback timing, and data used to profile employee risk.

Threat narrative

Attacker objective: The attacker wants to convert a believable work request into a trusted human action that opens the door to fraud, credential capture, or further account compromise.

Entry occurs when a targeted employee receives a personalised AI-generated message that matches a real role, vendor relationship, or work process.
Escalation follows when the recipient treats the request as routine and takes a risky action such as opening the message, replying, or entering credentials into a fake workflow.
Impact is the successful bypass of human judgement and the creation of a foothold for business email compromise, credential theft, or payment diversion.

Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Behaviour-based awareness is now an identity control problem, not a training problem. Completion and attendance metrics describe programme participation, but they do not prove that a human identity will resist a live social-engineering prompt. The attacker is exploiting a control gap at the moment of decision, which means security teams should treat awareness as part of identity governance rather than as a communications exercise. The practitioner conclusion is that proxy metrics cannot be the basis for trust.

Human trust has become machine-shaped, which makes classic phishing heuristics unreliable. When AI can draft messages that mirror a role, a workflow, or a vendor relationship, the old assumption that users can spot fishy language collapses. That is a named failure mode we can call contextual phishing normalisation: the message looks like ordinary work, so the recipient no longer applies suspicion. The practitioner conclusion is that detection and coaching have to track context, not just content.

Instant coaching is the right response because the failure happens at the moment of action. The article’s strongest operational point is that feedback delivered immediately after a mistake is more likely to change behaviour than annual training cycles. That aligns with how human identity decisions actually occur under pressure. The practitioner conclusion is to move from periodic education to continuous intervention at the point of risk.

Agentic automation changes awareness operations, but it does not eliminate governance responsibility. If AI is managing scenario selection, delivery, and adaptation, then the security team is delegating a human-risk workflow to an autonomous process. That shifts oversight from campaign administration to policy, auditability, and behavioural data stewardship. The practitioner conclusion is that automation should scale the programme, not dilute accountability.

Split-second social engineering exposes the limits of static programme design. A single person handling awareness for thousands of employees can only produce quarterly simulations, which leaves large exposure windows between exercises. The discipline problem is not content quality alone, but operational cadence. The practitioner conclusion is that human-risk programmes need continuous coverage if they are meant to influence live behaviour.

From our research:
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
Lack of credential rotation is cited as the top cause of NHI-related attacks by 45% of organisations, followed by inadequate monitoring and logging at 37% and over-privileged accounts at 37%, according to the same research.
For a broader view of the breach patterns behind these gaps, see 52 NHI Breaches Analysis, which maps recurring identity failure modes across real incidents.

What this signals

Human-risk programmes are moving toward continuous decision measurement, and that should change how CISOs think about awareness budgets. The useful question is no longer whether employees attended training, but whether they actually changed their response under pressure.

Contextual phishing normalisation: when messages imitate real work closely enough, the attack stops looking like an exception and starts looking like business as usual. That raises the bar for every control that depends on user suspicion and reinforces the need for role-based coaching, not generic reminders.

As teams connect human-risk telemetry with identity governance, they will need to think in terms of high-value populations and behavioural pockets. A programme that can show where users still fail under realistic lures is more defensible than one built on completion rates alone.

For practitioners

Measure behaviour, not attendance. Replace completion and click-through reporting with metrics that show whether users report suspicious messages, enter credentials, or escalate risky requests during simulations and live events.
Build role-specific simulation paths. Use scenarios that mirror finance, executive support, procurement, and other high-risk workflows so employees practice the exact patterns attackers use in normal business language.
Deliver feedback at the moment of failure. Route users who make a mistake to immediate coaching that explains the signal they missed and the decision they should have taken before the attack progressed.
Govern automated coaching as a delegated workflow. Define who approves scenario libraries, how adaptive training is tuned, and what behavioural data can be used to personalise risk scoring without creating surveillance blind spots.
Prioritise high-value identities first. Focus additional monitoring and coaching on finance, treasury, procurement, and executive-assistant populations where a single compromised action can have outsized business impact.

Key takeaways

AI-generated phishing weakens awareness programmes that still depend on users spotting obvious red flags.
Completion metrics are not security outcomes, because they do not show whether people behave differently under attack.
Continuous, role-specific coaching is a more credible model for reducing human-risk exposure than annual training cycles.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

NIST CSF 2.0, NIST SP 800-63 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	PR.AT-1	Awareness content and training effectiveness are directly relevant to this article.
NIST SP 800-63		Human identity decisions and phishing resistance sit inside digital identity risk.
NIST Zero Trust (SP 800-207)	PR.AC-4	Least privilege and decision verification depend on users resisting deceptive access requests.

Measure training against behaviour change, not completion, and verify that lessons influence live decisions.

Key terms

Human Risk Management: Human risk management is the practice of identifying, measuring, and reducing the ways people can be manipulated into unsafe security decisions. It focuses on behaviour under pressure, not just knowledge or training completion, and treats human action as a measurable part of the attack surface.
Business Email Compromise: Business email compromise is a social-engineering attack that tricks a person into approving payments, sharing information, or taking another harmful action through a believable message. The attack succeeds by impersonating trusted relationships and routine work, rather than by using overtly malicious technical indicators.
Behaviour-based training: Behaviour-based training is an awareness approach that measures what people do in realistic scenarios, then adapts coaching to the mistakes they actually make. It is more operationally useful than static content because it ties learning to decisions, response speed, and repeat vulnerability.
Security awareness proxy metric: A security awareness proxy metric is a measurement such as completion rate or click rate that describes participation but not real resilience. These metrics are easy to report, but they can create false confidence if they are not connected to how users behave during an actual attack.

Deepen your knowledge

NHI Foundation Level course, the industry's only accredited NHI security programme, covers NHI governance, agentic AI identity, machine identity security, IAM, human identity, identity lifecycle, secrets management, and workload identity. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by Abnormal AI: Key Insights Research on AI phishing coaching and behaviour-based awareness. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-29.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org