By NHI Mgmt Group Editorial TeamPublished 2026-04-03Domain: Breaches & IncidentsSource: Abnormal AI

TL;DR: EvilTokens productizes Microsoft 365 compromise by abusing Device Code OAuth, replaying refresh tokens for up to 90 days, and using LLaMA to turn inboxes into BEC intelligence, according to Abnormal AI. The real lesson is that MFA success does not equal session safety when token replay and conditional access gaps remain open.


At a glance

What this is: This is an analysis of EvilTokens, a productized phishing-as-a-service platform that bypasses MFA through Microsoft Device Code OAuth and turns stolen tokens into automated business email compromise.

Why it matters: It matters because identity teams must treat token replay, device-code exposure, and post-auth session abuse as governance problems, not just phishing prevention problems, across NHI, autonomous, and human identity programmes.

By the numbers:

👉 Read Abnormal AI's analysis of EvilTokens and Microsoft device code phishing


Context

Device code phishing is a form of authentication abuse that relies on a legitimate Microsoft login flow instead of a fake sign-in page. The user completes authentication on the real domain, which means MFA can succeed while the attacker still captures tokens and session material.

For IAM teams, the issue is not whether the login looked suspicious. The problem is that the post-authentication artefact, the OAuth token, can outlive the interaction that created it and be replayed from attacker infrastructure. That shifts the control discussion from phishing awareness to conditional access, token revocation, and device-code policy boundaries.

EvilTokens shows how criminal operators are packaging that weakness into a service, then adding automation to accelerate business email compromise. That makes this a human identity problem at the front door and a session governance problem after authentication.


Key questions

Q: How should security teams stop device code phishing in Microsoft 365 environments?

A: Start by disabling device code authentication everywhere it is not explicitly needed, then allow it only for approved headless devices through tightly scoped Conditional Access. Add strong session controls, rapid token revocation, and alerting on unusual token exchange activity. If the business does not need the flow, removing it is the cleanest control.

Q: Why do successful MFA prompts not prevent OAuth token abuse?

A: Because MFA proves a user completed authentication, not that the resulting token will stay bound to a trusted device or safe session. Once refresh tokens are captured, an attacker can replay them until revocation or expiry. Identity programmes need to govern the token lifecycle, not just the login event.

Q: What breaks when defenders focus only on phishing pages instead of token replay?

A: They miss the real compromise path. In device code attacks, the user authenticates on a legitimate Microsoft page, so there may be no fake login artefact to detect. The durable risk is the token issued after that interaction, which can be reused from attacker infrastructure for mailbox access and fraud.

Q: Who is accountable when compromised mailbox tokens are reused for business email compromise?

A: Accountability sits across identity operations, email security, and incident response because the failure spans authentication policy, token governance, and message abuse. Teams should define ownership for Conditional Access policy, token revocation, and mailbox monitoring before an incident, so containment does not stall when access abuse begins.




NHI Mgmt Group analysis

Device-code authentication assumes the user interaction and the token recipient are aligned. That assumption fails when the code is collected by an attacker-controlled workflow and the resulting OAuth grant is replayed elsewhere. The programme implication is that phishing controls built around fake-login detection do not address the actual trust boundary being abused.

Access review cadences are designed for privilege that persists long enough to be observed. EvilTokens shortens the attacker’s path from authentication to mailbox exploitation, then extends access through refresh-token replay. That means traditional review logic can miss the decisive abuse window because the meaningful risk sits in session continuity, not in standing account assignment.

Persistent mailbox access is now a token-lifecycle problem, not just a password problem. The platform’s repeated token exchange makes Conditional Access and revocation speed part of identity governance, because compromise can survive the user’s successful login and MFA completion. Practitioners should treat token replay as the control failure mode, not the lure itself.

Mailboxes have become structured intelligence stores for attackers, not just communication channels. By extracting wire instructions, routing numbers, and executive relationships, EvilTokens converts business email content into operational fraud data. The implication for the field is that IAM, email security, and insider-risk monitoring now overlap at the point where authenticated access becomes actionable intelligence.

Device-code abuse exposes a named governance gap: token-bound trust without device-bound assurance. The flow was designed for constrained devices, but the attacker uses it as an authentication bridge that decouples user approval from endpoint trust. Practitioners need to rethink any control model that treats successful MFA as proof of safe session origin.

From our research:

  • 64% of valid secrets leaked in 2022 are still valid and exploitable today, proving that detection alone is not enough without automated revocation, according to The State of Secrets Sprawl 2026.
  • AI-related credential leaks surged 81.5% year-over-year in 2025, with the surrounding AI infrastructure leaking 5x faster than core LLM providers.
  • That same pattern reinforces why identity teams should study the Ultimate Guide to NHIs for lifecycle and revocation controls that outlast a single attack path.

What this signals

Mailbox compromise is increasingly a session-governance problem, not a password-reset problem. Teams that still optimise only for phishing detection will miss the attack class where authentication is legitimate but the resulting token is not. The programme shift is toward policy control, token revocation, and mailbox telemetry that can detect automated abuse after login.

Token replay creates an identity blast radius that spans email, finance, and executive workflow. Once an attacker can search inboxes and send as the target, the mailbox becomes a fraud platform. Security teams should map where a single Microsoft 365 session can cross into payment approval, vendor change, or executive impersonation workflows.

Device-code governance needs to be paired with lifecycle discipline from the Ultimate Guide to NHIs. When a session can be reused for days, the control question becomes how quickly access can be revoked and how reliably privileged sessions are surfaced. That is a lifecycle and oversight issue as much as an authentication issue.


For practitioners

  • Disable device code authentication where it is not required Use Conditional Access to block the device code flow for standard users, and allow it only for approved headless-device scenarios with explicit business justification. That removes the attack path EvilTokens depends on and forces authentication through less abusable methods.
  • Tighten token revocation and session evaluation Pair Continuous Access Evaluation with rapid revocation playbooks so refresh tokens cannot keep a compromised mailbox alive for days. Revalidate high-risk sessions after privilege changes, geo anomalies, or suspicious token exchange patterns.
  • Constrain named locations and cloud relay sources Block or challenge authentication from known relay infrastructure and suspicious PaaS ranges, then review whether legitimate workloads still need those paths. The goal is to make token replay materially harder after the initial grant.
  • Monitor mailboxes for machine-like reconnaissance Alert on abnormal mailbox search patterns, bulk summarisation behaviour, unusual keyword sweeps, and rapid sequence access to financial terms. Those signals often appear before send-as abuse and can surface the attacker while access is still active.

Key takeaways

  • EvilTokens shows that legitimate MFA can still be the start of a compromise when attackers abuse device code OAuth and refresh-token replay.
  • The scale of the risk is persistence, not just initial access, because repeated token exchange can keep mailbox access alive long after the user authenticated.
  • Disabling device code flow where it is unnecessary, accelerating revocation, and monitoring mailbox behaviour are the controls that matter most here.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Non-Human Identity Top 10NHI-03Device code abuse and token replay map to NHI credential lifecycle failure.
NIST CSF 2.0PR.AC-3Authentication success does not guarantee safe session continuation.
NIST Zero Trust (SP 800-207)PR.AC-4This attack bypasses trust based on authentication alone.

Restrict device code use and enforce rapid token revocation for high-risk sessions.


Key terms

  • Device Code OAuth Flow: An OAuth authentication method for devices that cannot easily present a browser login. The user enters a short code on a separate trusted page, and the application later exchanges that code for tokens. In abuse cases, the authentication is real but the token recipient is not the intended device.
  • Refresh Token Replay: The repeated use of a long-lived OAuth refresh token to obtain new access tokens without asking the user to authenticate again. It turns a single captured credential into durable session access, which is why revocation speed and token telemetry matter so much in identity governance.
  • Business Email Compromise: A fraud pattern in which an attacker uses trusted email accounts or impersonation to manipulate payments, supplier details, or executive workflows. In identity terms, the mailbox becomes an authenticated fraud platform once the attacker can read, search, and send messages as the victim.
  • Conditional Access: A policy layer that decides whether to allow authentication or session continuation based on context such as location, device state, or user risk. For device-code abuse, its value lies in preventing or constraining flows that are valid in principle but dangerous in practice.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Abnormal AI: EvilTokens and Microsoft device code phishing analysis. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org