How should security teams measure human risk in phishing simulations?

They should measure more than clicks. The most useful signal is whether a user entered credentials, because that maps to real account takeover risk. Teams should also track reporting rates, repeat susceptibility, and segment-level patterns so training can be targeted. A dashboard is only valuable when it supports decisions about intervention, escalation, and programme effectiveness.

Why This Matters for Security Teams

Phishing simulations are often treated as a training metric, but human risk is really an exposure metric. A click shows curiosity or distraction; a credential submission shows a path to account takeover. That distinction matters because security teams need signals that map to real attack outcomes, not vanity scores. NIST’s NIST Cybersecurity Framework 2.0 emphasises measurable outcomes, which is the right lens here.

Teams that focus only on click rates can miss the users, teams, and scenarios most likely to create loss. The better question is which behaviours predict compromise, which groups are improving, and which interventions actually change behaviour over time. That is consistent with NHIMG guidance on NHI and identity risk, including the broader patterns described in the Top 10 NHI Issues and the Ultimate Guide to NHIs — Why NHI Security Matters Now. In practice, many security teams discover that “low click rates” coexist with poor reporting and repeated credential entry only after a real phishing campaign has already validated the gap.

How It Works in Practice

Human-risk measurement should start with a small set of indicators that reflect actual compromise likelihood, then segment them by department, role, geography, and simulation type. The most useful baseline is credential entry rate, because it indicates whether a message could plausibly lead to session theft, mailbox compromise, or downstream fraud. Reporting rate is the next most important signal, because fast reporting can contain an incident even when a user does interact. Repeat susceptibility helps distinguish one-off mistakes from persistent exposure.

Good programmes treat simulation results as decision support, not a scorecard. A practical workflow is:

Measure credential submission, report time, and report volume for every exercise.
Separate first-time failures from repeat failures to avoid averaging away persistent risk.
Compare groups against their own prior performance, not only against org-wide averages.
Use findings to target coaching, mailbox hardening, and escalation paths for higher-risk roles.

This aligns with the broader risk-management approach in the The 2024 ESG Report: Managing Non-Human Identities, where identity compromise is measured by outcomes rather than assumptions, and with NIST’s emphasis on identify, protect, detect, respond, and recover. It also pairs well with Ultimate Guide to NHIs — Key Challenges and Risks when teams are trying to translate security events into operational priorities. These controls tend to break down in organisations that reward pass/fail training scores more than response quality, because staff optimise for the metric instead of the underlying risk.

Common Variations and Edge Cases

Tighter measurement often increases administrative overhead and employee anxiety, so organisations need to balance behavioural insight against trust and privacy constraints. Best practice is evolving here, especially around how much granularity to expose to managers and whether to identify individuals or report only at segment level. There is no universal standard for this yet.

Some environments require different weighting. In high-exposure groups such as finance, executive support, or security operations, reporting speed may matter more than raw click rate because a rapid report can stop fraud before funds or access are lost. In mature programmes, repeated failure in a single simulation family may indicate content weakness rather than human weakness, which means the exercise design should be reviewed before the workforce is blamed. Where phishing simulations are tied to disciplinary action, reporting rates often fall because users stop trusting the programme, which weakens the signal.

NHIMG’s guidance on identity risk maturity is useful here because the same pattern appears across many control domains: measurement only works when it drives action. Security teams should use the results to refine training, improve controls, and identify where human behaviour intersects with technical exposure, rather than treating the dashboard as a final verdict.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	GV.OV	Phishing metrics should support governance oversight and measurable outcomes.
OWASP Non-Human Identity Top 10	NHI-01	Human credential capture mirrors real identity compromise pathways.
NIST AI RMF		Risk measurement should translate into actionable monitoring and response decisions.

Use simulation data to prioritise controls that reduce takeover risk, especially credential theft and reuse.

How should security teams measure human risk in phishing simulations?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group