By NHI Mgmt Group Editorial TeamPublished 2026-04-09Domain: Governance & RiskSource: Semperis

TL;DR: When identity systems fail, users cannot authenticate, applications cannot start, VPN access breaks, and broader business recovery stalls, according to Semperis. The message is clear: identity recovery is not an IT back-end task, but the first operational prerequisite for restoring the minimum viable company.


At a glance

What this is: This is a recovery-focused analysis of why identity must be restored first after a destructive cyber incident and how a minimum viable company depends on a clean identity rebuild.

Why it matters: It matters because IAM, PAM, and NHI programmes all depend on identity services that can be recovered safely, independently, and in the right sequence after disruption.

👉 Read Semperis' analysis of identity recovery as the foundation of business resilience


Context

The minimum viable company problem starts with a simple identity truth: if authentication is unavailable, the rest of the enterprise cannot meaningfully come back online. In a hybrid environment, that means recovery planning has to treat Active Directory, federation, and privileged access as business continuity dependencies rather than supporting services.

Semperis frames recovery through a fictional industrial manufacturer, but the underlying governance issue is real for any enterprise with regional sites, SaaS dependency, and critical operational systems. The question is not whether a cyber crisis will affect identity. It is whether the organisation can restore identity cleanly enough to resume operations without reintroducing compromise.


Key questions

Q: How should organisations design identity recovery for cyber incident response?

A: Start with the identity layer, not the application layer. Define the minimum set of services needed to authenticate users, support administration, and preserve forensic integrity, then prove you can restore those services independently of the compromised production environment. That sequence gives the rest of the business a trustworthy base to return online.

Q: Why does recovery fail when identity is not restored first?

A: Because every downstream service depends on trust in authentication and admin access. If the identity plane is unavailable or contaminated, users cannot log in, applications cannot start reliably, and recovery teams cannot safely manage the environment. Identity is the control point that makes the rest of the restoration sequence possible.

Q: What breaks when teams rely on system state restore for identity servers?

A: System state restore can reintroduce the operating system state that attackers or malware already influenced. If persistence survives the restore, the recovered identity service may look healthy while still being compromised. That is why clean rebuilds are safer when the compromise window or persistence location is uncertain.

Q: Who should own identity recovery during a major outage?

A: The recovery owner should be the team that can coordinate authentication, privilege, containment, and validation in the right order, with clear escalation paths for business and technical decisions. In practice, that means shared ownership across identity, infrastructure, and incident response, with one accountable lead for each stage.


Technical breakdown

Identity recovery as the first recovery dependency

Identity recovery is the sequencing problem that determines whether everything else can return safely. Authentication, authorisation, and auditing all depend on a working identity plane, so if domain controllers, federation services, or admin access are unavailable, application recovery becomes largely theoretical. In practice, the recovery target is not the whole environment first, but the smallest identity core that can support trusted access and forensic containment. That is why minimum viable company planning starts with identity services rather than endpoint rebuilds or application restoration. The architecture question is not how to bring everything back at once, but how to restore the access layer without restoring compromise.

Practical implication: Define the minimum identity services required for recovery before you write the wider business continuity sequence.

Clean rebuilds versus system state restore

System state and bare metal recovery can look efficient, but they can also preserve the very compromise the organisation is trying to escape. If malware, persistence mechanisms, or tampered operating system state survive the restore path, the recovered domain is not trustworthy. Clean operating system rebuilds reduce that risk by separating identity recovery from infected infrastructure. This is especially important where backup age, compromise timing, and persistence location are uncertain. Recovery engineering therefore has to distinguish between restoring availability and restoring trust. Those are not the same outcome, and treating them as equivalent creates hidden reinfection risk.

Practical implication: Use clean rebuild paths for domain controllers and identity infrastructure when compromise may have reached the operating system layer.

Fault-tolerant and staged identity restoration

A resilient recovery model avoids a single point of failure during the recovery itself. Fault tolerance means the process can continue if one node, platform, or location is unavailable. Staging means the enterprise restores the identity core first, then reintroduces regional services, then broader applications as risk and capacity allow. This mirrors operational reality in large organisations where WAN links, hardware availability, and recovery constraints differ by region. The point is not just speed. It is controlled re-entry into a trusted state, with identity services stabilised before broader workloads resume. That sequencing reduces both outage duration and the chance of a second compromise.

Practical implication: Build recovery phases that restore identity core, then regional authentication, then the wider estate in controlled waves.


Threat narrative

Attacker objective: The objective was to disrupt operations broadly by knocking out the identity infrastructure that supported authentication and recovery.

  1. Entry occurred through a destructive ransomware incident that crippled large parts of the environment and interrupted authentication-dependent operations.
  2. Credential and access services were impacted as employees could not log in, shop-floor automation stopped, and administrative access became unavailable.
  3. Impact spread across business operations because systems, applications, and manufacturing workflows depended on the identity layer for continuation.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

Identity recovery is a business continuity control, not a technical afterthought. When authentication fails, the enterprise loses the ability to bring users, systems, and administrative functions back in a trusted order. That makes identity the first recoverable business dependency, not a downstream support service. Practitioners should treat identity recovery as part of the minimum viable company design, not as a separate infrastructure exercise.

Clean rebuilds define trust more reliably than state-based restore paths. The article’s recovery model shows why operating-system-level persistence turns ordinary restore methods into reinfection risks. That is a governance problem as much as a technical one, because the organisation must prove it can recover without inheriting the original compromise. Practitioners should see clean rebuild capability as a trust boundary, not just a restoration method.

Staged identity recovery is the only practical way to reintroduce operational complexity safely. Global enterprises cannot assume every region, platform, or workload will return at the same pace. The recovery model therefore proves that identity restoration has to be fault tolerant, region-aware, and sequenced ahead of application resumption. Practitioners should align recovery architecture with the order in which access must return.

Identity resilience exposes the hidden coupling between IAM and business survival. This scenario makes the broader point that identity governance, admin tiering, and recovery readiness are inseparable from enterprise resilience planning. If the identity layer cannot come back cleanly, the rest of the programme remains aspirational. Practitioners should test whether their recovery design can support business continuity before they trust it in a real incident.

Minimum viable company planning should define the minimum viable identity first. The named concept here is straightforward: if the smallest working business cannot authenticate, nothing else is viable. That means organisations need a deliberate identity recovery baseline that supports forensics, containment, and basic operations before wider restoration begins. Practitioners should document that baseline as a resilience requirement, not an assumption.

From our research:

  • 79% of organisations have experienced secrets leaks, with 77% of these incidents resulting in tangible damage, according to Ultimate Guide to NHIs.
  • From our research: Only 20% have formal processes for offboarding and revoking API keys, and even fewer have procedures for rotating them, according to the Ultimate Guide to NHIs.
  • For recovery planning, see 52 NHI Breaches Analysis for patterns that show how identity weaknesses turn into operational outages.

What this signals

Identity recovery is becoming a board-level resilience issue, not a narrow infrastructure task. The practical takeaway for programmes is that authentication services, admin access paths, and recovery isolation must be designed together, because the business cannot reopen if the identity layer remains ambiguous or contaminated. A useful reference point is the Ultimate Guide to NHIs, especially where identity recovery depends on clean revocation and lifecycle control.

Recovery trust boundary: the point at which restored identity services are no longer treated as potentially compromised and can safely support broader business return. If that boundary is not explicit, teams will restart workloads faster than they can prove trust. For practitioners, the implication is simple: recovery playbooks should name the trust check, the owner, and the pass criteria before the first system comes back online.

Boards and incident leaders should expect recovery exercises to expose hidden dependencies in identity, admin tiering, and regional access design. Those dependencies matter most when the enterprise is under pressure, because a technically restored environment that cannot authenticate locally is not operationally recovered. For additional structure, align recovery assumptions with the NIST Cybersecurity Framework 2.0 and validate them in tabletop exercises.


For practitioners

  • Define a minimum viable identity baseline Document the smallest set of identity services needed to support authentication, forensics, containment, and admin access after a destructive incident. Include domain controllers, federation dependencies, and the recovery sequence needed to restore them before broader application work begins.
  • Separate recovery from reinfection risk Prefer clean operating system rebuilds for identity infrastructure when there is any chance that persistence survived in the original host image. Make the rebuild path independent of the compromised production environment so the restoration process does not inherit the original malware state.
  • Test staged restoration by region Run exercises that restore identity services in phases across locations, including scenarios where one site or network path is unavailable. Validate that regional users can authenticate locally or over the WAN as designed while the broader environment remains offline.
  • Map recovery accountability before the crisis Assign clear responsibility for each recovery step, including containment, clean rebuild approval, identity service validation, and staged reintroduction of dependent systems. Use RACI-style ownership so the recovery path does not stall when timing and escalation decisions are required.

Key takeaways

  • Identity recovery is the first prerequisite for restoring a minimum viable company because authentication underpins every downstream business function.
  • Clean rebuilds matter because system state restore paths can reintroduce malware or persistence into the recovered identity layer.
  • Staged, fault-tolerant restoration gives enterprises a safer way to bring identity services back before they reintroduce broader applications and workloads.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Non-Human Identity Top 10NHI-03Identity recovery depends on secure handling of NHI secrets and service access.
NIST CSF 2.0RC.RPThe article is centered on recovery planning and restoration sequencing after a cyber event.
NIST Zero Trust (SP 800-207)PR.ACRecovery only works if access is re-established through verified identity services.

Document recovery procedures that restore identity services first and validate them before wider system return.


Key terms

  • Minimum Viable Company: The smallest working version of an organisation that can still serve customers during a major disruption. In identity-heavy environments, this depends on restoring authentication, authorisation, and admin access in a controlled order so the business can function before the full estate returns.
  • Identity Recovery: The process of restoring authentication and access services to a trusted, usable state after compromise or outage. For enterprise environments, it means rebuilding identity services cleanly, validating them independently, and sequencing their return ahead of applications and endpoints.
  • Isolated Recovery Environment: A separate recovery workspace used to rebuild critical services away from the compromised production environment. It reduces reinfection risk by keeping restoration activities and trust validation outside the attack surface that was affected during the incident.
  • Staged Restoration: A phased approach to bringing systems back online after a crisis. Identity services return first, then regional access, then broader workloads, so the organisation can control risk, verify trust, and avoid reconnecting dependent services too early.

Deepen your knowledge

Identity recovery and staged restoration are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your resilience planning depends on clean identity rebuilds, this course is a practical place to start.

This post draws on content published by Semperis: identity recovery and the minimum viable company. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-09.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org