Auth migration at 200k users exposes the real cutover risks

By NHI Mgmt Group Editorial TeamPublished 2026-05-28Domain: Governance & RiskSource: WorkOS

TL;DR: Auth migrations above 200,000 users shift from simple cutovers to resumable imports, proxy-based SSO routing, and disciplined webhook sequencing, according to WorkOS. At that scale, the hard problem is not moving accounts, but preserving continuity while identity, event flow, and connection mapping change under load.

At a glance

What this is: This is a practical guide to auth migration at scale, showing why 200K-user cutovers require resumable imports, transparent SSO proxying, and careful webhook sequencing.

Why it matters: It matters because identity teams often underestimate how migration mechanics affect human IAM, NHI-adjacent service flows, and access continuity when enterprise scale turns routine cutover steps into outage risks.

👉 Read WorkOS's guide to migrating auth at scale above 200K users

Context

When authentication systems grow past roughly 200,000 users and dozens of enterprise SSO connections, migration stops behaving like a standard admin task and starts behaving like a continuity problem. The core issue is not just data movement, but preserving access, event integrity, and rollback options while identity dependencies are still live.

For IAM teams, this is where user lifecycle, federation, and event handling collide. The article’s central point is that small-migration shortcuts do not survive scale, so the migration design has to account for resumability, observability, and staged trust changes from the outset.

Key questions

Q: How should teams manage auth migrations when user counts exceed 200,000?

A: Teams should treat the migration as a staged identity change, not a single cutover. Use resumable imports, checkpointed batch processing, full diff validation, and feature-flagged routing so failures can be isolated. At that scale, rollback ability and event sequencing matter as much as the user import itself.

Q: Why do large SSO migrations fail more often than small ones?

A: They fail because the coordination cost of reconfiguring many enterprise connections overtakes the technical work. Each extra IdP relationship adds failure points, rollback complexity, and communication overhead, so manual per-customer changes stop being operationally viable once the connection count climbs.

Q: What do security teams get wrong about webhook handling during auth migration?

A: They often treat webhooks as a secondary concern and only think about them at the end of the cutover. In practice, queued events from the old provider can flood downstream systems when re-enabled, so webhook disablement and backlog control must be planned before the import begins.

Q: How do organisations decide between manual SSO reconfiguration and a transparent proxy?

A: Use manual reconfiguration for a small number of connections where customer admin coordination is still practical. Switch to a transparent proxy once the number of enterprise SSO connections makes per-customer setup the main source of delay and failure risk.

Technical breakdown

Resumable bulk import is the only safe shape for large auth migrations

A large auth migration cannot rely on a single import transaction because failures become likely before the cutover finishes. The practical pattern is export, checkpoint, batch import, and diff, so progress survives interruption and duplicate records are avoided. Persisting local state gives the migration operator a restart point, while batched writes make retry logic manageable. The CLI approach also standardises the workflow across source providers, which matters when teams are migrating from multiple legacy systems or handling repeated tenant moves.

Practical implication: use a resumable import workflow with local checkpoints, batch retries, and post-import diffing before you switch traffic.

Transparent proxy routing reduces SSO reconfiguration overhead

At higher connection counts, per-customer IdP reconfiguration becomes the bottleneck, not the protocol work itself. A transparent proxy keeps the existing callback or ACS endpoint in place and forwards traffic to the new auth system only for connections that have been migrated. That lets teams opt connections in one at a time while preserving service continuity. The tradeoff is that the migration team absorbs more architectural complexity, but it avoids coordinating dozens of customer admins at once, which is often the true scaling limit.

Practical implication: choose proxy-based routing when SSO connection count makes manual reconfiguration the limiting factor.

Webhook sequencing prevents event backlogs from becoming an outage

Auth migrations often create a hidden queueing problem. If old-provider webhooks are paused too late, or re-enabled without draining the backlog, downstream systems receive a burst of delayed events that can overwhelm consumers. The safe pattern is to disable old-provider webhooks before import, verify the new event stream, and deliberately drain or discard queued events rather than letting them replay unpredictably. This is an orchestration problem as much as an authentication problem, because the event plane can fail independently of the login plane.

Practical implication: sequence webhook disablement before import and control backlog replay explicitly, or your event consumers may be the first systems to fail.

NHI Mgmt Group analysis

200K-user auth migration creates a continuity gap, not just a project plan. Below that threshold, teams can absorb some manual rework and short-lived inconsistency. Above it, identity state, federation configuration, and downstream events all change at once, so the migration itself becomes the operational risk surface. The implication is that auth cutover must be treated as service continuity engineering, not just account transfer.

Identity migration at scale exposes blast radius as the controlling variable. The article shows that the real question is not whether a migration is possible, but how much state can diverge before the system becomes unrecoverable. That is a governance problem as much as a technical one, because staged rollout, rollback triggers, and observable checkpoints now define whether identity change remains controllable.

Auth lifecycle controls fail when they are designed for single-system moves instead of federated estate shifts. Import, verification, event sequencing, and connection reconfiguration are all lifecycle tasks, but they behave differently when dozens of enterprise connections and hundreds of thousands of users are involved. The implication is that IAM lifecycle design has to account for scale-dependent failure modes, not just nominal process completion.

Feature-flagged migration turns trust into a reversible decision. Routing a single connection at a time is not just a release tactic, it is a governance pattern that limits how far an error can spread. That matters because identity cutovers are rarely all-or-nothing in practice, and the safer programme is the one that can revert a subset of traffic without collapsing the whole migration.

Auth migration tooling is becoming part of the identity control plane. The CLI, proxy mapping, and event validation described here are not side utilities anymore. They are the mechanisms that determine whether access continuity, rollback, and completeness checks are actually enforceable during change. Practitioners should therefore evaluate migration tooling with the same seriousness they apply to federation and lifecycle controls.

From our research:
97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface, according to Ultimate Guide to NHIs.
Only 5.7% of organisations have full visibility into their service accounts, which is why migration and cutover planning often reveals more identity sprawl than teams expected.
That visibility gap makes lifecycle control the next priority, which is why Ultimate Guide to NHIs , Why NHI Security Matters Now is the right next reference point.

What this signals

Identity migration is becoming a governance discipline, not a deployment task. The practical lesson here is that enterprises need migration plans that can survive interruption, not just complete in the happy path. Teams that still rely on manual cutovers will find that rollback windows, event replay, and connection mapping now define programme risk.

Connection count is now a control variable. Once an estate reaches dozens of enterprise SSO links, the choice between direct reconfiguration and proxy-based transition becomes a governance decision about reversibility and blast radius. That is a useful pattern for IAM leaders because it turns migration design into something that can be standardised and reviewed.

Auth change programmes should be measured by recoverability, not just speed. If your process cannot pause, resume, diff, and validate across the whole user population, it is not mature enough for high-scale identity operations. That is true whether the affected identities are employees, service accounts, or federated connections.

For practitioners

Define a scale-based cutover threshold Set an internal migration threshold where resumable imports, staged routing, and formal rollback controls become mandatory. Do not reuse the same playbook for a 5,000-user tenant and a 200,000-user estate.
Use batch import checkpoints and diff validation Persist local progress during export and import so interrupted runs can restart without duplicating users. Run a full diff against the source before traffic cutover to detect drift.
Sequence webhook disablement before import Disable old-provider webhooks before the migration starts, then control queued event replay deliberately after cutover. Treat the event backlog as a first-class risk.
Adopt proxy-based routing for large SSO fleets Use a transparent proxy when manual IdP reconfiguration would dominate the migration effort. Opt connections in one at a time behind a feature flag and keep a per-connection kill switch.
Validate password hash portability early Check which user stores can preserve password hashes and which cannot, then plan silent resets or next-login resets for the unsupported set before the import begins.

Key takeaways

Large auth migrations fail differently from small ones because scale turns identity change into a continuity problem.
Resumable imports, proxy-based routing, and webhook sequencing are the control points that keep cutovers from becoming outages.
Identity teams should measure migration success by recoverability, completeness, and rollback control, not by how quickly the first cutover finishes.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

NIST CSF 2.0, NIST Zero Trust (SP 800-207) and NIST SP 800-63 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	PR.AC-4	Large auth migrations affect access control continuity and least-privilege governance.
NIST Zero Trust (SP 800-207)	SC-7	Proxy-based SSO routing aligns with controlled trust boundaries during transition.
NIST SP 800-63		Federated authentication changes affect identity lifecycle and assurance handling.

Map migration cutovers to PR.AC-4 and verify access continuity before switching traffic.

Key terms

Transparent proxy migration: A migration pattern where the existing authentication endpoint stays in place and forwards selected traffic to a new identity platform. It reduces customer-facing reconfiguration in large federated estates, but it requires careful routing logic, validation, and rollback planning to avoid partial cutover failures.
Resumable import: An import process that stores progress so a failed or interrupted migration can restart without duplicating records. For identity systems, this matters because large user populations make one-shot transfers fragile, and checkpointing becomes essential to preserve completeness and reduce operational risk.
Webhook backlog: Queued event traffic that accumulates while delivery is paused or disrupted. In auth migrations, backlog replay can overwhelm downstream systems after cutover if the old event stream is re-enabled without deliberate draining or discard logic, making event sequencing a core stability control.
Cutover blast radius: The amount of identity impact that can occur if a migration step fails or is reversed. In high-scale auth programmes, blast radius is controlled through staged rollout, feature flags, and per-connection fallback paths, rather than by relying on a single all-or-nothing switch.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by WorkOS: Migrating auth at scale, with guidance for 200K-user auth migrations. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-28.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org