TL;DR: DNS reliability is not just uptime but a mix of latency, redundancy, security and visibility, and DigiCert notes UltraDNS processed nearly 42 trillion queries in 2023 while one study found 82% of businesses saw DNS-based attacks lead to application outages. The real test is whether smaller teams can govern this dependency without creating new failure points.
At a glance
What this is: This is a DNS reliability and provider-selection analysis showing that resilience depends on infrastructure, performance, security, and support, not uptime claims alone.
Why it matters: It matters to IAM practitioners because DNS sits underneath authentication, certificate validation, workload connectivity, and user access paths, so DNS failure can disrupt both human and non-human identity services.
By the numbers:
- UltraDNS processed nearly 42 trillion DNS queries in 2023 alone.
- DNS-based attacks led to application outages in 82% of businesses.
- The threat landscape is constantly evolving, with one study finding that one in every 174 DNS requests is malicious.
- 368.4 million.
👉 Read DigiCert's analysis of reliable DNS for SMBs
Context
DNS reliability is the ability to resolve names quickly, consistently, and securely under load, failure, or attack. For identity programmes, it is a dependency layer that affects federation endpoints, certificate lookups, service connectivity, and the availability of systems that authenticate both people and machines.
The governance gap is that many teams treat DNS as a network utility rather than as part of the identity control plane. Once DNS becomes slow, unavailable, or tampered with, authentication flows, workload reachability, and trust validation all degrade together, which is why resilience and security have to be assessed together.
SMBs face a second problem: enterprise DNS controls often require expertise, visibility, and budget that smaller teams do not have. That makes provider selection a governance decision, not just a procurement choice.
Key questions
Q: How should security teams evaluate DNS reliability for identity-dependent systems?
A: Start by mapping which identity services depend on DNS, including SSO, federation, certificate validation, and workload discovery. Then test redundancy, latency, and trust controls under failure conditions. A DNS provider is only reliable if outages, tampering, and traffic floods do not break authentication or application reachability.
Q: Why does DNS reliability matter for IAM and workload identity programmes?
A: Because DNS sits underneath the services that issue, validate, and resolve trust. If DNS fails or is attacked, login redirects, federated identity endpoints, and machine connectivity can fail together. That turns a network issue into an access and availability problem across human and non-human identities.
Q: What breaks when DNS controls are treated as a commodity service?
A: Teams often discover that low-cost DNS lacks visibility, failover transparency, and security depth. Without those controls, they cannot diagnose outages quickly or limit blast radius when malicious traffic, misconfiguration, or provider failure occurs. The result is reactive operations and wider service disruption.
Q: Who should own DNS governance in an identity-heavy environment?
A: Ownership should sit with a shared governance model that includes infrastructure, security, and identity stakeholders. DNS affects access paths, certificate trust, and workload availability, so leaving it outside identity oversight creates blind spots. Accountability should be explicit before an outage exposes the gap.
Technical breakdown
Anycast DNS and global redundancy
Anycast DNS routes the same IP address from multiple points of presence, sending each query to the nearest available server. That design reduces latency and removes single points of failure because traffic can shift automatically if one site, region, or data centre fails. Redundancy also needs to exist within the provider site itself, including duplicated servers, power, and network paths. Reliability is therefore an architectural property, not a promise attached to an SLA.
Practical implication: require proof of distributed Anycast coverage and failover behaviour before treating a DNS service as production-grade.
Query latency, caching, and user experience
DNS latency is the time between a request for a domain name and the response with the correct IP address. Because DNS is often the first step in any connection, even small delays can slow authentication redirects, application login pages, and API calls. Good providers improve latency through geographic proximity and intelligent caching of frequently used records. Consistency matters as much as speed, because unpredictable lookup times create operational noise that is hard to diagnose.
Practical implication: measure lookup times from the regions where users, workloads, and identity services actually operate.
DNS security features as reliability controls
A DNS service is only reliable if it resists abuse. DDoS mitigation helps absorb traffic floods, DNSSEC adds cryptographic integrity to responses, and encrypted transports such as DoT and DoH reduce exposure to interception or tampering. These are not separate security extras. They directly protect availability and trust by preventing attackers or misrouting from turning DNS into an outage mechanism.
Practical implication: treat DDoS protection, DNSSEC, and encrypted resolution as mandatory reliability requirements, not optional add-ons.
Threat narrative
Attacker objective: The attacker aims to interrupt availability or redirect trust in a way that breaks user access to services and degrades confidence in the organisation's online presence.
- Entry occurs when adversaries target DNS through traffic floods, tampering attempts, or malicious requests aimed at overwhelming resolvers or corrupting trust in responses.
- Escalation follows when weak resilience, poor visibility, or missing integrity controls allow a local DNS issue to spread into authentication failures, application outages, or misleading name resolution.
- Impact is broad service disruption, loss of user trust, and broken access to applications that depend on name resolution for login, federation, and connectivity.
Breaches seen in the wild
- Sisense breach — unauthorized GitLab access led to exfiltration of access tokens, API keys and certificates.
- Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
DNS reliability is now an identity dependency, not just a network property. Modern IAM, federation, certificate validation, and workload connectivity all rely on DNS to resolve the services they trust. When teams separate DNS governance from identity governance, they miss how quickly lookup failure becomes access failure. Practitioners should treat DNS resilience as part of the control plane that keeps identity services reachable.
Enterprise-grade DNS exposes the same governance tension seen in NHI programmes: capability without operating maturity creates new failure modes. Advanced failover, traffic steering, and analytics only help if a team can configure and monitor them correctly. For smaller organisations, the real issue is not feature shortage but the gap between resilience ambition and day-two operational capacity. Practitioners should choose controls they can actually run.
Reliable DNS is a blast-radius problem. The named concept here is identity blast radius: how far one DNS failure, poisoning attempt, or provider outage can propagate into access disruption across applications, certificates, and workforce workflows. Once that radius is wide, one control failure becomes an enterprise availability event. Practitioners should scope DNS as a shared dependency with downstream identity impact.
SMB buyers need to evaluate DNS like an identity service level, not a commodity record store. The article shows that cost, support, visibility, and redundancy are inseparable when the service sits beneath every login and lookup. That means procurement, security, and infrastructure teams should review DNS together instead of splitting the decision across silos. Practitioners should make the service accountable to identity outcomes.
From our research:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
- For teams hardening resolution and trust paths, the NHI Lifecycle Management Guide is the next step for aligning governance, visibility, and operational ownership.
What this signals
Identity blast radius: DNS issues become identity issues when lookup failures interrupt federated access, certificate validation, or workload discovery. That means practitioners should include DNS in the same resilience review as authentication endpoints and machine identity services, not leave it in infrastructure-only oversight. For control alignment, pair operational monitoring with the NIST Cybersecurity Framework 2.0 and identity-path validation.
The budget question is not whether SMBs can afford enterprise DNS, but whether they can afford identity outages caused by under-governed DNS. The article shows that resilience needs visibility, security, and failover competence, which are the same programme traits identity teams look for in the Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs. Practitioners should treat DNS procurement as part of identity resilience planning.
The signal for practitioners is that DNS is now part of the shared dependency stack for people, machines, and services. With The State of Secrets in AppSec showing a 27-day average secret remediation window, governance gaps rarely stay isolated. Teams should watch for hidden coupling between DNS, secret handling, and access availability before it becomes an outage cascade.
For practitioners
- Map DNS dependencies across identity flows Inventory where DNS resolution supports SSO, federation endpoints, certificate validation, workload discovery, and API connectivity so outages are measured as identity-impacting events, not just network incidents.
- Test provider redundancy with failure drills Verify how the service behaves when a PoP, region, or resolver path fails, and confirm that traffic reroutes without breaking authentication or application reachability.
- Require security controls that preserve trust Make DNSSEC, DDoS mitigation, and encrypted resolution part of the baseline evaluation so availability and integrity are assessed together during vendor review.
- Measure latency from real operating regions Benchmark query performance from the locations where users, service accounts, and cloud workloads actually operate, then compare the results against your service thresholds.
- Align DNS ownership with identity governance Assign clear accountability for DNS decisions inside the same governance model that oversees access, certificates, and external dependencies, rather than leaving it as an infrastructure afterthought.
Key takeaways
- DNS reliability is a governance issue because name resolution underpins identity access, certificate trust, and workload connectivity.
- The scale of DNS exposure is large, with billions of queries and frequent malicious traffic making availability and integrity controls essential.
- SMBs should evaluate DNS on redundancy, latency, security, and operational support, not on price or uptime claims alone.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
NIST CSF 2.0, NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | PR.AC-1 | DNS underpins access to identity services and federation endpoints. |
| NIST CSF 2.0 | DE.CM-8 | Monitoring DNS health is essential for detecting tampering and outage conditions. |
| NIST Zero Trust (SP 800-207) | Zero trust depends on reliable resolution of services and policy endpoints. |
Add DNS telemetry to continuous monitoring and alert on integrity or reachability drift.
Key terms
- Anycast DNS: Anycast DNS is a routing model where the same IP address is announced from multiple locations, and traffic is sent to the nearest healthy server. It improves resilience and latency by allowing requests to move away from failed or overloaded sites without changing the service address.
- DNSSEC: DNSSEC is a set of extensions that adds cryptographic signing to DNS data so clients can verify responses have not been altered. It protects integrity, not confidentiality, and is most valuable when organisations need stronger trust in the resolution path.
- Query latency: Query latency is the time it takes for a DNS request to be answered with the correct address. In practice, it affects first-page load times, login redirects, and service responsiveness, so even small delays can cascade into visible performance problems.
- Identity blast radius: Identity blast radius is the extent to which one control failure spreads across authentication, trust validation, and service access. In DNS contexts, it measures how far a resolver outage or tampering event can disrupt users, workloads, and downstream identity-dependent systems.
Deepen your knowledge
NHI governance, identity lifecycle management, and workload identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or operational governance, it is worth exploring.
This post draws on content published by DigiCert: Scaling Smart: How SMBs Can Achieve Enterprise-Grade Reliable DNS on a Budget. Read the original.
Published by the NHIMG editorial team on 2026-06-17.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org