TL;DR: Dark data makes unmanaged, unclassified information the weak point in enterprise security, with Splunk's State of Dark Data survey finding that 55% of enterprise data is dark on average. The issue is not just storage sprawl, but the fact that access governance, classification, retention, and audit controls cannot defend what they never inventory.
At a glance
What this is: Dark data is forgotten, unmanaged data that sits outside normal classification, access, and retention controls, and the article argues that it creates a direct security and compliance blind spot.
Why it matters: IAM, NHI, and data governance teams need to treat unknown data stores as governance failures because access reviews, least privilege, and audit evidence do not work when the asset itself is invisible.
By the numbers:
- 55% of enterprise data is dark on average.
- 35% of data breaches involved shadow data and those breaches cost 16% more on average.
👉 Read Netwrix's analysis of dark data and data governance blind spots
Context
Dark data is the unmanaged portion of an organisation's information estate, meaning data that exists in storage but is not inventoried, classified, or actively governed. In practice, that creates a data governance problem because security teams cannot protect, retain, or review access to stores they do not know exist.
The article's core claim is that most security programmes still assume known, governed data is the default. Once dark data accumulates in forgotten buckets, legacy shares, SaaS exports, logs, and backups, those assumptions fail across data security, IAM, and audit readiness.
That is why visibility has to come before policy. Without a current inventory, any retention rule, access review, or classification standard risks becoming documentation without operational reach, which is exactly the gap dark data exploits.
Key questions
Q: What breaks when dark data is not inventoried?
A: When dark data is not inventoried, classification, retention, encryption, and access review all lose their operational target. Teams cannot prove who can reach the data, whether it still needs to exist, or whether it contains regulated information. The result is governance on paper with no control over the actual repository.
Q: Why does dark data increase compliance risk for regulated industries?
A: Dark data increases compliance risk because privacy and sector rules depend on knowing where regulated information lives and why it is retained. If personal data, payment data, or credentials sit in unknown stores, organisations struggle with deletion requests, retention enforcement, audit evidence, and breach scoping.
Q: How can security teams tell whether dark data governance is working?
A: Teams should look for a shrinking number of unknown stores, a current inventory, documented owners, and remediation records tied to each high-risk location. If new repositories keep appearing outside the control process, governance is still reactive rather than operational.
Q: Who should own dark data remediation in an organisation?
A: Dark data remediation usually needs shared accountability across security, data governance, IAM, and business owners. Security can discover and prioritise the stores, but only data owners can decide retention, deletion, and access purpose. Without assigned ownership, the same ungoverned data will persist across review cycles.
Technical breakdown
Why dark data escapes classification and access governance
Dark data becomes invisible when data is created by one system or team and then copied into places that are outside the normal control plane. Examples include legacy file shares, cloud buckets created for temporary work, SaaS exports, logs, and backup archives. Classification tools only work where they are pointed, and access governance only works for assets that exist in inventory. Once a store is orphaned, both the identity layer and the data layer lose context. That means permissions can remain broad, retention can remain unbounded, and sensitive content can persist without an owner.
Practical implication: inventory data stores first, then attach classification and permission review to the discovered estate.
How lifecycle failures create persistent dark data
Dark data is often the by-product of missing lifecycle governance rather than a one-time mistake. Data enters through ingestion pipelines, exports, and integrations, but there is no deletion trigger, reclassification event, or ownership reassignment when the original use case ends. That is especially common in cloud storage, collaboration platforms, archives, and AI or automation workflows that retain logs and cache layers. The result is that the data lifecycle never closes. Retention becomes accidental permanence, and unmanaged permanence becomes a standing compliance and security liability.
Practical implication: define ownership, retention boundaries, and deletion triggers for each data class before the first export is created.
Why discovery has to precede policy
Security teams often write retention and classification policy before they have a reliable map of where sensitive data lives. That reverses the order of operations. Discovery is the control that establishes scope, shows effective permissions, and identifies high-risk stores before enforcement begins. Once the inventory exists, organisations can prioritise regulated data, apply lifecycle rules, and route exceptions into formal remediation. Without discovery, controls look complete on paper but remain partial in practice because they only govern known repositories.
Practical implication: run automated discovery in read-only mode, validate the inventory, and use that baseline to drive policy enforcement.
NHI Mgmt Group analysis
Dark data is really an identity problem disguised as a storage problem. The article shows that if a store is never inventoried, then no access review, classification rule, or retention control can meaningfully govern it. That means the security issue is not just where data sits, but which identities can reach unknown repositories without oversight. Practitioners should treat undiscovered stores as unmanaged access surfaces, not just unused storage.
Visibility debt is the right named concept for this risk. Dark data accumulates because organisations delay discovery, ownership assignment, and lifecycle closure until after the data has already spread across platforms. That delay creates a compounded gap between what the business thinks exists and what security can actually govern. The practical conclusion is that every new integration, export, and archive adds visibility debt unless it is mapped at creation time.
Access governance collapses when the asset inventory is incomplete. IAM and data security controls assume there is a known object to certify, review, or revoke. Dark data breaks that assumption because the repository itself is outside the control boundary. The implication is that governance programmes need to measure unknown stores as a control failure condition, not as an edge case.
Lifecycle controls are the missing prevention layer. The article makes clear that retention, deletion, and ownership assignment are not cleanup activities after the fact. They are the mechanisms that stop unmanaged data from becoming permanent. Practitioners should understand dark data as the symptom of lifecycle processes that are either absent or never enforced at the point of creation.
Auditors will increasingly treat unknown stores as evidence gaps, not technical surprises. The article correctly frames structured discovery, documented methodology, and remediation logs as part of governance maturity. That matters because assurance now depends on proving that regulated data is discoverable and reviewable, not merely asserting that policies exist. Security teams should expect unknown repositories to become audit findings unless discovery is continuous.
From our research:
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
- Only 1 in 4 organisations are already investing in dedicated NHI security capabilities, which means most teams are still trying to govern machine access without dedicated controls.
- For a broader control baseline, see NIST Cybersecurity Framework 2.0 and map discovery, protection, and governance to your data estate.
What this signals
Visibility debt is the operational pattern security teams should watch for next. As cloud buckets, SaaS exports, and AI-generated caches spread faster than governance can inventory them, dark data becomes a repeating failure mode rather than a one-off cleanup issue.
The practical response is to treat discovery as a continuous control, not a project. When a programme can show a current inventory, named owners, and repeatable remediation evidence, it is finally governing the data estate instead of documenting its unknowns.
For practitioners
- Inventory high-risk data stores first Start with cloud object storage tied to production, legacy file shares, Microsoft 365 repositories, SaaS exports, backups, and archive tiers. Build the baseline from discovered locations rather than from assumptions about where regulated data should be.
- Map effective permissions to every discovered store Export the effective-permissions view alongside classification results so security, IAM, and data owners can see which identities can actually reach each repository. Prioritise stores where access is broad and ownership is unclear.
- Assign ownership and retention decisions Require a named business owner for each store, then force an explicit decision on whether the data is still needed and whether access reflects least privilege. Unknown owners should escalate into a defined governance queue instead of remaining open-ended.
- Automate lifecycle rules by data class Tie retention, deletion, and exception handling to the sensitivity classification of the repository. Make time-bound exceptions the rule for all indefinite retention so manual backlog does not become permanent dark data.
Key takeaways
- Dark data is a governance failure because unknown repositories sit outside classification, access review, and retention control.
- The scale of the problem is material, with Splunk estimating that 55% of enterprise data is dark on average and IBM linking shadow data to longer and more expensive breaches.
- Security teams need continuous discovery, named ownership, and lifecycle enforcement to stop forgotten data stores from becoming permanent exposure points.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
NIST CSF 2.0, NIST CSF 2.0 and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | ID.AM-1 | Asset inventory is the prerequisite for finding dark data stores. |
| NIST CSF 2.0 | PR.AC-4 | Dark data often retains over-broad access outside review cycles. |
| NIST CSF 2.0 | PR.DS-1 | Classification and retention determine how data is protected and governed. |
Apply data-class driven retention and deletion rules to reduce unmanaged storage.
Key terms
- Dark Data: Data that an organisation has stored but no longer actively inventories, classifies, or governs. It may contain regulated information, credentials, or operational records, but the organisation lacks reliable visibility into where it lives, who can access it, and how long it should remain retained.
- Visibility Debt: The accumulated gap between the data an organisation believes it controls and the data that actually exists across its environments. It grows when discovery, ownership assignment, and lifecycle enforcement are delayed, making security, compliance, and audit work harder over time.
- Effective Permissions: The real access an identity has after group membership, inheritance, and delegated rights are taken into account. In dark data contexts, effective permissions matter more than intended policy because the question is not what should be allowed, but what can actually reach the data store.
- Data Lifecycle Governance: The control discipline that sets ownership, retention, deletion, and review rules for data from creation through disposal. When lifecycle governance is weak, data persists beyond its purpose and becomes a compliance and security liability, especially in backups, exports, and archive systems.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.
This post draws on content published by Netwrix: Dark data explained, why invisible data is a security problem. Read the original.
Published by the NHIMG editorial team on 2026-06-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org