Bulk data rule compliance now depends on identity security

By NHI Mgmt Group Editorial TeamPublished 2025-09-24Domain: Governance & RiskSource: Delinea

TL;DR: The DOJ’s new Data Security Program broadens bulk data risk beyond direct sales to include who can reach sensitive U.S. data through privileged accounts, cloud entitlements, vendors, and AI agents, according to Delinea. Compliance now hinges on identity-to-data access control, because access paths can create violations even without a breach.

At a glance

What this is: This is Delinea’s analysis of the DOJ’s Bulk Data Rule and the identity security controls that determine whether sensitive data access becomes a compliance failure.

Why it matters: It matters because IAM, PAM, and NHI governance now sit inside the compliance boundary for bulk data, cloud access, third parties, and AI workloads.

By the numbers:

Studies show that the ratio of non-human to human identities can be as high as 46:1, creating a vast, often unmanaged attack surface.

👉 Read Delinea's analysis of the DOJ Bulk Data Rule and identity security

Context

The Bulk Data Rule turns identity-to-data access into a compliance issue, not just a security one. If privileged users, service accounts, vendors, or AI agents can reach sensitive datasets without tight governance, the organisation can create regulated exposure even when no breach has occurred.

For IAM and PAM teams, the practical problem is that data protection controls and identity controls are no longer separable. Bulk data compliance now depends on knowing who or what can decrypt, query, export, or administer sensitive records across cloud and on-premises environments.

Key questions

Q: How should security teams govern access to bulk data under the DOJ rule?

A: Security teams should govern every identity that can reach bulk data, not just the users who approve or view it. That means mapping privileged accounts, vendors, service accounts, API keys, and AI workloads to the datasets they can touch, then enforcing least privilege, time-bounded access, and evidence-backed review. The access path is the compliance boundary.

Q: Why do non-human identities increase bulk data compliance risk?

A: Non-human identities increase risk because they often hold broad, persistent, and poorly reviewed access to data systems. They operate at machine speed, are frequently excluded from human access review processes, and can move large volumes of data without triggering the same scrutiny as a person. That creates a much larger compliance and exposure surface.

Q: What breaks when cloud entitlements to sensitive data are not tightly governed?

A: When cloud entitlements are loose, organisations lose control over who can reach regulated datasets through inherited roles, shared services, and third-party paths. The result is exposure that can exist entirely inside legitimate cloud permissions, which makes it hard to prove compliance or contain misuse quickly. The problem is entitlement design, not just detection.

Q: Who is accountable when a vendor or AI workload causes bulk data exposure?

A: Accountability sits with the organisation that allowed the access path to exist, even when the immediate actor is a vendor or workload identity. That means security, IAM, legal, and compliance teams must prove that the access was necessary, time-bound, and reviewed. If they cannot, the governance failure is internal, not external.

Technical breakdown

How bulk data exposure becomes an identity problem

The rule focuses on transactions and access paths, not only on data sales. That means a dataset can be in scope when a cloud operator, offshore developer, contractor, or service account can reach it through privileged access, federated access, or misconfigured entitlements. Identity is the control plane that determines whether access is direct, indirect, or inherited across systems. If those identities can interact with sensitive data without strong governance, the organisation has already created a regulated exposure path.

Practical implication: Map every identity that can reach sensitive data, including indirect and third-party paths, before treating the data set as compliant.

Why NHI governance is central to bulk data compliance

Non-human identities sit at the centre of most bulk data workflows because applications, integrations, scripts, and AI workloads often access the data more frequently than people do. Secrets, service accounts, and API tokens can bypass human-centric approval flows while still carrying the power to query, move, or decrypt bulk records. Traditional perimeter controls do not see those interactions clearly, and encryption does not help if the identity is already authorised to use the decrypted data.

Practical implication: Bring service accounts, API keys, and workload identities into the same governance model as human privileged users.

How AI agents change the access equation

AI agents introduce a new identity class that can combine data retrieval, tool use, and action execution at runtime. In bulk data environments, that matters because an agent with broad entitlements can amplify access faster than a human operator can review it. Even when the model is not fully autonomous, the identity attached to the workload still needs scope, lifecycle, and monitoring controls tied to the data it can touch. That shifts bulk data governance from static permissioning to continuous identity oversight.

Practical implication: Treat AI workload identities as access-bearing subjects and restrict their data scope to the minimum operational task.

Threat narrative

Attacker objective: The objective is to obtain or misuse access to sensitive bulk data in a way that creates legal, national security, or reporting exposure.

Entry occurs through privileged accounts, cloud entitlements, third-party access, or AI workload identities that can reach sensitive datasets without adequate segregation.
Escalation happens when those identities carry excessive permissions, time-boxing is absent, or monitoring does not flag unusual exports, broad queries, or cross-system access.
Impact is regulatory exposure, where authorised but poorly governed access to bulk data can violate DOJ restrictions even before a breach is detected.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Identity security is now part of the DOJ compliance boundary. The Bulk Data Rule does not only care whether data is encrypted or stored in the right place, it cares who can reach it and under what conditions. That makes privileged access, third-party access, and NHI governance compliance controls, not just security hygiene. Practitioners should treat identity-to-data paths as regulated surfaces.

Bulk data programmes still fail when governance stops at the human user layer. The article correctly points out that applications, scripts, service accounts, and AI workloads are often the real operators of sensitive datasets. Organisations that do not govern those identities will miss the access paths most likely to create bulk data violations. The implication is that identity inventories must extend far beyond employee accounts.

Standing privilege in bulk data environments is a governance liability, not merely an operational convenience. When accounts can continuously reach sensitive records, the organisation cannot show narrow purpose, limited duration, or bounded scope. That breaks the assumption that access can be safely granted and reviewed later. Practitioners should read this as a failure of entitlement design, not simply of enforcement tooling.

Bulk data compliance now depends on the lifecycle of machine access. Secrets, service accounts, and vendor identities do not retire themselves when a project changes or a partner relationship ends. If offboarding, recertification, and rotation do not apply to those identities, access can outlive the business reason for it. Identity lifecycle governance is therefore a compliance control, not an administrative afterthought.

Identity blast radius is the right concept for bulk data risk. A single overprivileged credential can expose millions of records, multiple clouds, and third-party systems at once. That is why this rule should push teams toward explicit blast-radius reduction across PAM, CIEM, IGA, and ITDR. The practical conclusion is that broad access paths must be collapsed before audit pressure forces the issue.

From our research:
96% of organisations store secrets outside of secrets managers in vulnerable locations including code, config files, and CI/CD tools, according to Ultimate Guide to NHIs.
71% of NHIs are not rotated within recommended time frames, increasing the risk of compromise over time, according to Ultimate Guide to NHIs.
For a broader view of the identity surface, see Top 10 NHI Issues and how overlooked machine identities create compliance blind spots.

What this signals

Identity blast radius: bulk data programmes will increasingly be judged by how much sensitive data a single credential can reach, not by whether the data is technically encrypted. That shifts priority toward entitlement reduction, secret hygiene, and time-bounded access across clouds and vendors.

Teams should expect bulk data compliance to merge PAM, IGA, CIEM, and ITDR into one operational control loop. The organisations that can show who had access, for how long, and under what approval trail will be better positioned when enforcement or audit questions arrive.

A useful benchmark is whether sensitive datasets can still be reached by credentials stored in code or CI/CD pipelines. If they can, then the organisation has not reduced identity risk enough to support the regulatory expectations described in the rule.

For practitioners

Inventory every identity that can touch bulk data Build a current map of human, service, vendor, and workload identities that can query, export, decrypt, or administer sensitive datasets across cloud and on-premises systems.
Reduce standing privilege on data-bearing systems Replace always-on access with just-in-time access, task-scoped entitlements, and session logging for database, file, and backup platforms that hold regulated data.
Bring NHI lifecycle controls into compliance workflows Apply recertification, offboarding, and rotation to API keys, service accounts, and vendor credentials so access cannot outlive the business justification for the data relationship.
Monitor for high-risk identity behaviour against bulk datasets Alert on large exports, broad search patterns, rapid cross-system access, and unusual hours, then tie those signals to automated revocation or containment playbooks.
Align identity evidence to the DOJ rule set Document who approved access, when it was granted, what data it can reach, and how it is reviewed so compliance teams can show due diligence during enforcement and audit cycles.

Key takeaways

The Bulk Data Rule makes identity governance a compliance requirement for sensitive data access, not a secondary control.
Identity risk is already large in practice, with 96% of organisations storing secrets outside managed vaults and exposing bulk data paths.
Organisations should map and constrain every privileged, third-party, and workload identity that can reach bulk datasets before enforcement pressure grows.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Secrets and service account exposure are central to bulk data access paths.
NIST CSF 2.0	PR.AC-4	Least privilege and access management map directly to regulated data access control.
NIST Zero Trust (SP 800-207)	AC-4	Zero Trust access control supports conditional access to sensitive data across identities.

Restrict bulk data access to minimum-necessary identities and review entitlements continuously.

Key terms

Bulk data exposure: Bulk data exposure is the condition where large volumes of sensitive records can be reached by identities that are not tightly governed. It matters because the risk is not only theft, but also regulatory violation when access paths are broader than the business need.
Non-human identity: A non-human identity is any machine-, workload-, or automation-issued credential used by software, services, or AI systems. In practice, it includes service accounts, API keys, tokens, certificates, and agent identities that can read, write, or move data without a person in the loop.
Identity blast radius: Identity blast radius is the amount of data, systems, and business process impact a single credential can create if misused or compromised. It is a useful governance concept because bulk data environments can turn one overprivileged identity into organisation-wide exposure quickly.
Just-in-time access: Just-in-time access is a temporary privilege pattern where access is granted only for a specific task and then removed. For bulk data governance, it helps reduce standing exposure, shorten misuse windows, and provide clearer evidence that access was necessary and controlled.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Delinea: Why identity security is key to the Department of Justice’s new Bulk Data Rule. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-24.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org