TL;DR: Cyera argues that Microsoft Purview labels leave roughly 40% of enterprise files outside label-based DLP, including CSVs, ZIPs, code, screenshots, and image-based PDFs that Copilot can still read if permissions allow. The governance problem is not discovery alone but enforcing content-based policy across files that labeling cannot reach.
NHIMG editorial — what this means for NHI practitioners
By the numbers:
- Purview policies don't cover 40% of your files in M365.
Questions worth separating out
Q: How should security teams govern sensitive data in file types that cannot be labeled?
A: Security teams should classify content directly and enforce policy on the file itself, not on the label.
Q: Why do unlabeled files matter so much in Copilot environments?
A: Unlabeled files matter because Copilot inherits the permissions already granted to users and can read whatever those users can access.
Q: What breaks when DLP depends only on sensitivity labels?
A: The control breaks wherever the file format cannot carry a label or where the label was never applied.
Practitioner guidance
- Map all unlabelable file types Inventory CSV, ZIP, code, screenshot, and image-based PDF repositories, then classify which locations contain PII, secrets, financial records, or source code.
- Apply content-based policies to sensitive data Define enforcement on content patterns such as PII, API keys, and regulated records, then apply the same policy logic across labeled and unlabeled files.
- Review Copilot-accessible file exposure Check which sensitive files are readable by users whose permissions also power Copilot and other AI assistants.
Teams should expect more scrutiny on unlabelable formats, especially where sensitive exports flow into collaboration tools and automated assistants?
👉 Read Cyera's analysis of M365 label coverage gaps and Copilot risk →
Explore further
Content coverage, not label coverage, is now the real governance boundary. Labels remain useful, but they do not define the full perimeter of sensitive data in modern M365 estates. CSVs, archives, screenshots, and source files routinely carry the highest-value content, and security teams that rely on document labels alone will undercount exposure. The correct control question is whether sensitive content is enforceable wherever it appears, not whether it fits the label schema. Practitioners should move to coverage models that measure the share of sensitive data under enforceable policy, not the share of files that can be labeled.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
A question worth separating out:
Q: How can organisations reduce AI-driven data exposure in M365?
A: Organisations should combine label-based controls for supported files with content-based controls for everything else, then review which sensitive files are readable by AI assistants. That approach reduces accidental disclosure, narrows over-permissioned access paths, and gives security teams a single view of what is actually covered. It is especially important where Copilot can surface shared content at scale.
👉 Read our full editorial: M365 label gaps leave sensitive files outside Copilot controls