AI security starts with data access, not model control

By NHI Mgmt Group Editorial TeamPublished 2026-05-21Domain: Agentic AI & NHIsSource: Cyera

TL;DR: AI tools are already moving 84 percent of enterprise data, and nearly 72 percent of those tools are classified as high or critical risk, according to Cyera's AI Security for Dummies special edition. The governance gap is not the model itself but the data and access paths AI can reach, which makes data-first controls the practical starting point.

At a glance

What this is: This is an analysis of why AI security failures often begin with data access, not model behaviour, and why discovery, access control, and evidence matter more than model-centric assumptions.

Why it matters: IAM teams need to understand that AI expands existing access and governance problems across NHI, autonomous, and human identity programmes, so data visibility and entitlement control become foundational.

By the numbers:

84 percent of enterprise data is already flowing through AI tools.
72 percent of those tools are classified as, fied as high or critical risk.

👉 Read Cyera's analysis of why AI security starts with data access

Context

AI security is increasingly a data access problem, not just a model risk problem. When an internal assistant can retrieve documents, databases, and email content in one response, the security boundary is defined by what the system can reach, not only by what it can generate.

That is why current IAM, data governance, and AI security models often fail together. Discovery, access control, auditability, and accountability have to be aligned before organisations can safely scale internal chatbots, embedded AI, or homegrown AI systems.

Cyera's article frames this as a readiness issue rather than a purely technical AI issue, which is the right starting point. Most enterprises are already running multiple AI forms at once, and that makes cross-domain governance the real control surface.

Key questions

Q: What breaks when AI systems can reach too many data sources?

A: The main failure is not that the model becomes inaccurate, but that authorised access turns into unintended disclosure. When one prompt can aggregate documents, APIs, databases, and email, the effective security boundary is the combined path. Teams need to govern composition risk, because isolated entitlements can still create an unsafe whole.

Q: Why do AI tools complicate IAM governance?

A: They complicate IAM because the access subject may be a human user, an embedded service, or an AI system pulling data on behalf of a workflow. Each one can trigger different entitlement and audit requirements. If teams do not classify the identity and its data reach correctly, they will review the wrong thing.

Q: How do security teams know whether AI access is actually working safely?

A: Look for three signals: complete discovery of the AI estate, clear mapping of source data to each system, and logs that prove what was accessed and why. If any of those are missing, the control environment is incomplete. Safe AI access is evidenced, not assumed.

Q: Should organisations treat embedded AI and homegrown AI the same way?

A: No. Embedded AI often inherits risk from SaaS defaults, while homegrown AI introduces custom retrieval paths and local accountability gaps. Both need governance, but the control points differ. Teams should standardise the review model while still tracking the specific identity and data path for each class.

Technical breakdown

Why AI data reach changes the access model

Traditional applications usually request one bounded action at a time. AI systems can aggregate content from documents, APIs, databases, and email in a single interaction, which turns the access problem into a composition problem. The security question is no longer only whether a source is permitted, but whether combining many permitted sources creates an unsafe disclosure path. That is why access scope, query pathways, and data classification have to be evaluated together.

Practical implication: map which data sources can be combined by each AI system before you let it operate on real content.

Discovery and visibility for AI tools and agents

AI discovery is difficult because the environment usually contains public AI, embedded AI, and homegrown AI at the same time. Some tools are obvious, but others are switched on by default in SaaS platforms or built by teams outside central review. Without discovery, there is no reliable inventory of which systems are exposing sensitive data or which identities they use. That makes every later control, from logging to access policy, incomplete.

Practical implication: build an inventory of AI tools, their data sources, and the identities they run under before expanding use.

Evidence, logging, and accountability for AI access

Cyera points to a growing need for proof, not just internal confidence. In practice, that means audit logs, data lineage, and documented accountability that show what AI systems accessed, why they could access it, and who approved the setup. Without those artefacts, organisations cannot demonstrate that the AI behaved within policy, especially when regulators or customers ask for evidence after deployment.

Practical implication: require traceable logs and lineage records for AI access paths before treating the system as production-ready.

MongoBleed breach — MongoBleed exposed secrets across 87K MongoDB servers.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI security is an access-governance problem before it is a model-governance problem. The article is right to centre the data the system can reach, because that is where most enterprise exposure actually lives. When an AI assistant can aggregate from multiple repositories, the risk comes from authorised breadth, not just malicious input. Practitioners should treat AI access as a governance boundary that must be defined before the model is trusted with real work.

Data access reviews designed for human workflows do not map cleanly onto AI-driven retrieval. A human user usually makes one request at a time and can be reviewed through conventional entitlement logic. AI systems can traverse many sources in one interaction, which means access review has to examine composition paths, not only individual permissions. The implication is that existing review cadences can miss the actual disclosure path.

Named concept: AI data reach exposure. This is the condition where an AI system is allowed to touch enough connected sources that a normal request becomes an information synthesis event. The organisation may believe each source is safely governed, yet the combined access path produces a new confidentiality problem. Practitioners need to treat that synthesis path as a separate control boundary, not as a by-product of application behaviour.

Discovery is the prerequisite control for AI governance across human, NHI, and autonomous programmes. The article shows why organisations that do not know what AI they are running cannot credibly set controls for it. That is a cross-domain identity lesson, not just an AI lesson, because embedded AI, homegrown assistants, and agentic systems all depend on underlying identities and permissions. The practitioner conclusion is simple: if you cannot inventory the actors and the data they can reach, you cannot govern the outcome.

Proof matters because accountability is becoming a first-class control requirement. Organisations will increasingly be asked to show what data an AI system accessed, how the access was authorised, and what normal behaviour looked like. That makes lineage, logs, and decision records part of the security programme rather than forensic extras. The practitioner takeaway is that undocumented AI access is already a governance defect.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to the State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant behaviour gap in day-to-day application control.
For the lifecycle angle, read the Ultimate Guide to NHIs for the control model that connects discovery, rotation, and offboarding.

What this signals

AI data reach exposure: the next governance battleground is not just who can authenticate, but which connected sources an AI system can synthesize in one response. Organisations that treat AI as a simple feature toggle will miss the compound-access risk that emerges when multiple repositories are reachable through one interface.

With 43% of security professionals concerned about AI systems learning and reproducing sensitive information patterns from codebases, the operational challenge is already broader than one application layer. That concern should push programme owners to integrate AI discovery with data classification and NHI governance before uncontrolled exposure becomes normal.

Enterprises should prepare for evidence-based AI governance to become routine, because regulators and customers increasingly expect auditability around access decisions and data lineage. The organisation that can show traceability will be in a materially better position than the one that can only claim it had controls in place.

For practitioners

Inventory all AI systems and their access paths Classify public AI, embedded AI, and homegrown AI separately, then map the documents, APIs, databases, and email sources each one can reach. Treat the inventory as a control baseline, not a one-time discovery exercise.
Review compound access paths before production use Test whether an AI response can combine data from multiple sources into a disclosure that would not be obvious from any single entitlement review. Focus on the aggregate path, not only on individual permissions.
Require audit logs and data lineage for AI retrieval Make evidence of access mandatory for systems that touch sensitive data, including who approved the setup, what source was queried, and how the response was assembled. Preserve those records for incident review and compliance.
Align AI governance with identity governance Tie AI access to the same lifecycle, recertification, and accountability processes used for other non-human identities. That keeps owners visible when access changes and prevents shadow deployments from bypassing review.

Key takeaways

AI security failures often start with overbroad data access, not with model malfunction.
Discovery, lineage, and auditability are now core control requirements for safe AI adoption.
Identity governance has to extend to AI retrieval paths before sensitive data becomes routine output.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	AI systems depend on non-human identities and their data reach.
NIST CSF 2.0	PR.AC-4	AI access control depends on least-privilege assignment and review.
NIST Zero Trust (SP 800-207)	AC-4	Zero Trust requires continuous verification of access to sensitive data.

Map AI retrieval paths to least-privilege access reviews and remove unnecessary data sources.

Key terms

AI Data Reach: The set of documents, databases, APIs, and other sources an AI system can access and combine in one response. It matters because the security risk is not only what each source contains, but what becomes exposed when those sources are synthesised together.
Compound Access Path: A chained access route where one AI interaction can draw from several systems at once. This creates a new confidentiality boundary because individually acceptable permissions can still produce an unsafe disclosure when the outputs are assembled together.
Data Lineage: The record of where data came from, how it moved, and which systems touched it. For AI governance, lineage is evidence that helps prove whether a response was assembled from approved sources and whether the access path was legitimate.
AI Discovery: The process of identifying every AI system in use, including embedded features, internal chatbots, and agentic workflows. It is the prerequisite for governance because an organisation cannot control or audit what it has not first found and classified.

Deepen your knowledge

AI access paths, discovery, and governance alignment are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is already facing embedded AI and homegrown assistants, the course gives that problem a practical control model.

This post draws on content published by Cyera: The AI Worked Perfectly. That Was the Problem. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-21.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org