What do security teams get wrong about phishing simulations?

Why This Matters for Security Teams

Phishing simulations are often treated as a proxy for user vigilance, but that only works if the test matches the attack surface. Many real-world attacks now begin outside the inbox, using browser-native consent screens, fake device enrollment flows, malicious downloads, and other paths that bypass email-only awareness metrics. When the programme measures the easiest channel to simulate, it can create a false sense of maturity rather than reduce exposure.

That gap matters because modern identity compromise is rarely isolated to one click. A successful lure can lead to token theft, OAuth abuse, or credential reuse across systems, which is why broader identity controls matter as much as training. NHI Management Group’s Ultimate Guide to NHIs notes that 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, showing how quickly a single interaction can extend into machine-to-machine access. Current guidance from the NIST Cybersecurity Framework 2.0 points teams toward outcome-based risk management, not just awareness activity.

In practice, many security teams discover the weakness only after a browser-based compromise has already translated into account takeover or lateral movement, rather than through intentional measurement.

How It Works in Practice

Effective measurement starts by mapping simulations to the ways attackers actually deliver payloads and steal tokens. If the dominant enterprise threat is OAuth consent abuse, device code phishing, or fake software installers, then the exercise should test those paths, not just suspicious email links. The goal is to evaluate whether users, browsers, and control points can interrupt the kill chain at the point of interaction.

That usually means combining awareness testing with identity and endpoint telemetry. For example, a team may simulate a malicious app consent flow, then confirm whether conditional access, admin consent workflows, token monitoring, and user reporting all work together. The Ultimate Guide to NHIs highlights how often organisations lack visibility into service accounts and secrets hygiene, which matters because a phishing simulation that ends in token capture can quickly become a non-human identity problem if the compromised token is reused by automation or an API client. That is where NHI governance and user awareness meet.

Measure by attack path, not by delivery channel alone.

Include browser-native prompts, download warnings, and authentication consent abuse in the test mix.

Track whether the user reports, rejects, or escalates the event quickly enough to block token issuance.

Correlate simulation results with actual control coverage, such as MFA resilience, OAuth governance, and endpoint detection.

For governance, the NIST Cybersecurity Framework 2.0 supports measuring protective and detective outcomes together, which is more useful than a raw click-through rate. These controls tend to break down when simulations are treated as a standalone HR exercise because the results do not reflect the organisation’s true identity attack surface.

Common Variations and Edge Cases

Tighter simulation programmes often increase operational overhead, requiring organisations to balance realism against disruption and false positives. That tradeoff is especially visible in regulated environments, where aggressive simulations can interfere with user support, incident response, or business-critical workflows.

There is no universal standard for this yet, but current guidance suggests that simulations should be segmented by risk tier and user population. High-risk groups such as finance, IT admins, developers, and executives should face more relevant scenarios, while general users may only need baseline awareness plus browser and download hygiene. Teams should also be careful not to over-index on “failure rates” as the sole metric. A user who reports a suspicious OAuth prompt may be more valuable than one who simply avoids a click in a contrived test.

Edge cases matter. If the organisation uses SSO heavily, then device code phishing and consent phishing deserve more attention than spam filters. If software distribution is decentralised, fake download campaigns may be more realistic than email lures. If non-human identities are already exposed, as described in the Ultimate Guide to NHIs, the simulation should also check whether a stolen session or API token could be abused after the initial lure. Best practice is evolving toward control validation across user, browser, and identity layers rather than awareness scoring alone.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	DE.CM	Simulation results should validate detection of browser-native and identity abuse.
OWASP Non-Human Identity Top 10	NHI-07	Token theft and OAuth abuse can turn a phishing event into NHI compromise.
NIST AI RMF		Risk management should account for evolving attack paths and control blind spots.

Measure whether controls detect and respond to realistic phishing paths, not just inbox clicks.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do security teams get wrong about phishing simulations?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group