NHI Forum
Read full article here: https://www.oasis.security/blog/dont-look-back-in-anger-how-cloudflares-outage-highlights-the-need-for-safer-rotations/?soucre=nhimg
Service availability is the lifeblood of modern enterprises. Yet on March 21, 2025, Cloudflare, a company trusted to keep the internet running, suffered a global outage triggered by a seemingly routine security task: rotating credentials.
The incident took down Cloudflare’s R2 Object Storage for more than an hour, disrupting writes worldwide and partially degrading reads. For customers, it was a reminder that even the best-resourced providers can stumble. For identity and security teams, it was proof that credential rotation isn’t simple housekeeping, it’s mission-critical infrastructure hygiene.
What Happened: The Cloudflare Rotation Outage
- New keys generated - Cloudflare created an updated key pair for its R2 Gateway component.
- Mis-deployment - The new credential was mistakenly pushed to the default (dev) environment, instead of production, due to an omitted “--env production” parameter.
- Old keys deleted too soon - Assuming production had migrated, engineers removed the old credentials.
- Credential mismatch: Production continued trying to use the invalidated keys, causing every write to fail and ~35% of reads to break.
The missing safeguard? Verification. Cloudflare’s team had no real-time visibility into which credentials were actually active. Deletion occurred before validation, leading directly to the outage.
Why It Matters
Cloudflare’s misstep wasn’t about poor security. It was about operational complexity. Manual steps, fragmented visibility, and dependency blind spots turned a best practice into an outage. Without structured automation and robust guardrails, even well-intentioned rotations can disrupt production at scale.
Not the First Time: Rotation Gone Wrong
Credential mismanagement has caused major incidents before:
- Dropbox (2024) - A mismanaged service account with unrotated credentials was exploited, forcing a hasty global token reset.
- Microsoft Exchange (2023) - Attackers forged tokens with a stolen signing key after rotation procedures were delayed out of fear of downtime.
- Microsoft Limiting Secret Expiration (2021) - Microsoft eliminated “never expire” secrets after outages tied to unrotated credentials, forcing enterprises to adopt stricter rotation cycles.
The lesson across all of these: rotation without automation and visibility equals risk.
Best Practices for Safer Credential Rotations
- Maintain a full inventory of NHIs - Every API key, service principal, and token must be mapped to an owner, system, and use case.
- Automate verification before deletion - Confirm new keys are active, and old ones inactive, before decommissioning.
- Adopt rolling, phased rotations - Replace keys in stages, validate, then retire, to minimize production risk.
- Continuously assess posture - Flag stale secrets, over-privileged identities, and orphaned accounts regularly.
- Log with context and ownership - Every credential should have usage logs tied to a responsible owner, reducing audit gaps.
How Oasis Makes Rotation Predictable
Oasis Security approaches Non-Human Identity (NHI) lifecycle management with automation and context at its core:
- Discovery - Automatically detect every credential across cloud, SaaS, vaults, and CI/CD pipelines.
- Context Mapping - Link secrets to their owners, privileges, and dependencies to avoid surprises.
- Policy-Driven Rotation - Enforce consistent 30-day or event-driven cycles, with automated checks before revocation.
- Closed-Loop Monitoring - Detect orphaned identities and expired secrets continuously, not after they break production.
Instead of “rotate and pray,” Oasis delivers rotate and verify, ensuring continuity and resilience.
Final Takeaway
Cloudflare’s March 2025 outage is a cautionary tale: credential rotation can no longer be manual, blind, or reactive. With NHIs now outnumbering humans by 50:1 in enterprise environments, the margin for error is too small.
The solution is clear: automated, identity-centric rotation built on visibility, validation, and lifecycle governance. With platforms like Oasis, enterprises can secure their NHIs without sacrificing uptime, proving that security and resilience don’t have to be at odds.