M365 label gaps leave sensitive files outside Copilot controls

By NHI Mgmt Group Editorial TeamPublished 2026-05-29Domain: AnnouncementsSource: Cyera

TL;DR: Cyera argues that Microsoft Purview labels leave roughly 40% of enterprise files outside label-based DLP, including CSVs, ZIPs, code, screenshots, and image-based PDFs that Copilot can still read if permissions allow. The governance problem is not discovery alone but enforcing content-based policy across files that labeling cannot reach.

At a glance

What this is: Cyera’s analysis says M365 sensitivity labels do not cover a large share of enterprise file types, leaving content-based risks outside label-driven DLP and Copilot controls.

Why it matters: IAM and NHI practitioners need file-level governance that follows content, because unlabeled exports and embedded secrets can expose sensitive data without changing access permissions.

By the numbers:

Purview policies don't cover 40% of your files in M365.

👉 Read Cyera's analysis of M365 label coverage gaps and Copilot risk

Context

Microsoft 365 data controls often start with labels, but labels only work when the file type supports them. That leaves a large operational gap for CSV exports, ZIP archives, screenshots, code, and image-based PDFs, even when those files contain PII, credentials, or regulated data. For NHI governance, that matters because secrets and sensitive outputs often live in these unlabeled file types first.

Copilot increases the practical impact of that gap because it inherits existing permissions and can read files that policy never classified. The issue is not that every unlabeled file is dangerous, but that the enterprise cannot rely on label-centric controls as a complete access boundary. In NHI terms, this is a content visibility problem that becomes an authorization problem when machine access scales.

Key questions

Q: How should security teams govern sensitive data in file types that cannot be labeled?

A: Security teams should classify content directly and enforce policy on the file itself, not on the label. That means scanning unsupported formats such as CSVs, ZIPs, screenshots, and code files for sensitive patterns, then applying the same DLP and remediation rules used for labeled documents. The goal is one control model across every file type that can carry regulated data or secrets.

Q: Why do unlabeled files matter so much in Copilot environments?

A: Unlabeled files matter because Copilot inherits the permissions already granted to users and can read whatever those users can access. If a sensitive file has no label, label-driven DLP never fires, so the file may remain accessible to both humans and AI assistants. That turns a classification gap into a machine-readable exposure path.

Q: What breaks when DLP depends only on sensitivity labels?

A: The control breaks wherever the file format cannot carry a label or where the label was never applied. In practice, that leaves exports, archives, screenshots, and code files outside the policy engine even when they contain PII, credentials, or intellectual property. The result is partial coverage that looks complete in reports but leaves real exposure behind.

Q: How can organisations reduce AI-driven data exposure in M365?

A: Organisations should combine label-based controls for supported files with content-based controls for everything else, then review which sensitive files are readable by AI assistants. That approach reduces accidental disclosure, narrows over-permissioned access paths, and gives security teams a single view of what is actually covered. It is especially important where Copilot can surface shared content at scale.

How it works in practice

How content-based classification changes the control model

Content-based classification looks inside the file rather than trusting the filename, metadata, or container format. That matters because a CSV may contain regulated customer records, a code file may contain embedded API keys, and an image-based PDF may carry sensitive text that OCR can surface. Cyera’s framing is that policy should operate on the content itself, so the same sensitivity logic can apply across labeled documents and unlabeled exports. This is closer to a data-centric control model than a document-centric one. For NHI governance, the architectural shift is significant because access decisions increasingly depend on what autonomous systems can interpret and reuse from raw content. Practical implication: use classification to normalize enforcement across every file type that can carry secrets or regulated data.

Practical implication: Anchor DLP policy to data patterns and file content, then verify that unlabelable formats are included in the same policy logic.

What unified coverage means for Copilot-era data governance

A unified coverage model combines label-based protection for supported files with content-based enforcement for everything else. That gives security teams one view of which files are covered, which policies apply, and where gaps remain. The operational value is not just visibility. It is the ability to route unlabeled, sensitive files into the same remediation workflow as labeled ones, so the organization does not maintain parallel governance tracks. This matters for Copilot because the assistant inherits the file permissions that already exist. If a sensitive export is accessible and ungoverned, machine-scale consumption can turn a routine file into a broad exposure point. Practical implication: design governance around complete coverage, not around the subset of files labels can reach.

Practical implication: Build one inventory and one remediation queue for labeled and unlabeled sensitive files so Copilot exposure stays governable.

NHI Mgmt Group analysis

Content coverage, not label coverage, is now the real governance boundary. Labels remain useful, but they do not define the full perimeter of sensitive data in modern M365 estates. CSVs, archives, screenshots, and source files routinely carry the highest-value content, and security teams that rely on document labels alone will undercount exposure. The correct control question is whether sensitive content is enforceable wherever it appears, not whether it fits the label schema. Practitioners should move to coverage models that measure the share of sensitive data under enforceable policy, not the share of files that can be labeled.

Copilot turns incomplete data classification into an identity problem. When an AI assistant inherits user permissions, every ungoverned file becomes a potential machine-read exposure path. That changes the risk discussion from passive leakage to effective access, because the assistant can surface content at scale once the underlying permission is in place. The field should stop treating GenAI data security as a separate program and start treating it as a consequence of incomplete content governance. Practitioners should assume that any file visible to a user may become visible to an AI agent acting under that user’s authority.

Unlabelable files create an identity blast radius that traditional DLP metrics miss. The issue is not merely whether sensitive data exists outside labels, but how far that data can travel once shared, synced, or indexed. A single CSV export can propagate through email, file sync, Copilot queries, and downstream workflows without ever touching the label engine. That is why practitioners need policy tied to content recognition and destination, not just attachment state. Teams should measure blast radius across file formats, destinations, and machine-read surfaces.

A content-first control model is becoming the practical baseline for NHI governance in M365. Non-human systems consume the same files as humans, but at a speed and scale that exposes weak assumptions quickly. If the control plane only protects office documents, it leaves the rest of the estate available to automated discovery, indexing, and reuse. The governance standard should be consistent policy over every file that can hold secrets, regulated data, or intellectual property. Practitioners should plan for content-first enforcement as the default, not an exception.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
The natural next step is to pair file-level coverage with lifecycle controls, as outlined in Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs.

What this signals

Content coverage gaps will become Copilot governance gaps. As organisations expand AI access to shared files, the question is no longer whether a document has a label but whether the underlying content is governable at all. Teams should expect more scrutiny on unlabelable formats, especially where sensitive exports flow into collaboration tools and automated assistants.

With 6 distinct secrets manager instances on average, fragmentation already weakens central control according to The State of Secrets in AppSec. That fragmentation now extends to file governance when one layer protects office documents and another layer is needed for everything else. Practitioners should plan for a unified inventory across identities, secrets, and sensitive files so the control model does not split along product boundaries.

Identity blast radius should become the operating metric for M365 data governance. If a user or agent can reach an unlabelled file, then the practical exposure is determined by permissions plus machine-readability, not by the presence of a sensitivity label. Teams should align their monitoring to that combined risk and use the Ultimate Guide to NHIs , Regulatory and Audit Perspectives to translate coverage gaps into audit-ready findings.

For practitioners

Map all unlabelable file types Inventory CSV, ZIP, code, screenshot, and image-based PDF repositories, then classify which locations contain PII, secrets, financial records, or source code. Use that map to identify where label-based DLP cannot reach and where compensating controls are required.
Apply content-based policies to sensitive data Define enforcement on content patterns such as PII, API keys, and regulated records, then apply the same policy logic across labeled and unlabeled files. Validate that the rule set follows the content when files move through email, share links, and sync paths.
Review Copilot-accessible file exposure Check which sensitive files are readable by users whose permissions also power Copilot and other AI assistants. Prioritise files that are unlabelled but broadly accessible, because those create the largest machine-scale exposure paths.
Unify remediation queues for all sensitive files Put labeled documents and unlabelable files into one inventory, one alert flow, and one remediation workflow. That prevents parallel governance tracks and makes coverage gaps visible to the teams responsible for data, identity, and AI risk.

Key takeaways

Sensitivity labels alone do not provide complete M365 data coverage when common file formats fall outside the label model.
Copilot makes file classification gaps more consequential because it can read anything the underlying permission model already exposes.
Security teams should shift from label-centric reporting to content-first coverage, remediation, and audit measurement.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Unlabeled exports and embedded secrets create unmanaged NHI-adjacent exposure paths.
NIST CSF 2.0	PR.DS-1	Data protection controls must cover sensitive content wherever it resides, not just labeled files.
NIST AI RMF		Copilot access to sensitive files creates governance requirements for AI-assisted data use.

Extend controls to content in CSVs, archives, and code files, then verify remediation for exposed secrets.

Key terms

Sensitivity label: A sensitivity label is a classification marker applied to a file or document to drive policy enforcement such as DLP, encryption, and sharing restrictions. In practice, the label only helps if the file type supports it and the organisation applies it consistently across the estate.
Content-based classification: Content-based classification inspects the information inside a file rather than relying on its name, metadata, or format. It is essential when common business files carry sensitive data but cannot use the same label system as Office documents or text-based PDFs.
Copilot exposure path: A Copilot exposure path is a route by which an AI assistant can access and surface sensitive content through the permissions already granted to a user. The risk grows when file governance is incomplete, because machine-scale access can turn a routine sharing decision into broader disclosure.
Identity blast radius: Identity blast radius is the practical extent of damage a user or non-human identity can cause once it can reach sensitive content. It includes how many files, systems, and downstream workflows become exposed when access is broader than the control model assumed.

Deepen your knowledge

M365 label coverage and content-based DLP are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance for Copilot-era file exposure, it is worth exploring.

This post draws on content published by Cyera: Secure M365 Files that Sensitivity Labels Can't Reach with Cyera Omni DLP. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-29.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org