AI factories expose identity gaps that cloud AI hid

By NHI Mgmt Group Editorial TeamPublished 2025-07-08Domain: Best PracticesSource: Delinea

TL;DR: AI factories move AI development on-premises or into hybrid data centres, giving organisations more control but also shifting responsibility for access management, auditing, and privileged control onto internal teams, according to Delinea. The critical gap is that high-performance AI environments can amplify unmanaged service identities, shadow AI, and over-privileged access faster than standard IAM processes can keep up.

At a glance

What this is: This is Delinea’s analysis of securing AI factories with NIST HPC guidance, NVIDIA architecture, and identity-centric controls, with the core finding that AI factories create new governance pressure on accounts, privileges, and auditability.

Why it matters: It matters because AI factory security sits at the intersection of NHI, privileged access, and AI governance, and the controls that work in ordinary IT often break down under high-performance, highly automated infrastructure.

By the numbers:

Only 5.7% of organisations have full visibility into their service accounts.
97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface.

👉 Read Delinea's analysis of NIST controls and identity security for AI factories

Context

AI factories are purpose-built on-premises or hybrid environments for training and deploying AI at scale, and they bring the identity security problem back inside the enterprise boundary. The issue is not just compute or storage. It is that every scheduler, service account, admin path, and data pipeline becomes part of the trust model, which makes AI factory governance an identity problem as much as an infrastructure one.

Delinea’s article argues that the move from managed cloud AI to in-house AI factories shifts security accountability to the organisation running the environment. That shift exposes a familiar pattern in a more extreme form: unmanaged identities, excessive privilege, and weak audit boundaries become easier to create and harder to detect when performance pressure encourages exceptions.

Key questions

Q: How should security teams govern service accounts in AI factories?

A: Security teams should treat service accounts in AI factories as high-value non-human identities with clear owners, scope, and expiry dates. They should be provisioned through a central directory or identity platform, rotated on schedule, and removed when the workload ends. If an AI pipeline can run without anyone knowing who owns the account, governance has already failed.

Q: Why do AI factories increase the risk of privilege creep?

A: AI factories increase privilege creep because performance-sensitive clusters encourage broad operational access, temporary exceptions, and account reuse across jobs and tools. Over time, those exceptions harden into standing privilege across compute, storage, and management zones. The risk is not theoretical: once a machine identity can move laterally, it becomes a durable access path.

Q: What breaks when AI workloads run outside zone-based controls?

A: When AI workloads run outside zone-based controls, access becomes too broad to audit and too easy to reuse. Users, schedulers, and service identities can reach storage or management functions they were never meant to touch. That creates an environment where data access, model changes, and administrative actions blur together, which weakens both containment and accountability.

Q: Who is accountable when an AI factory identity is misused?

A: Accountability sits with the organisation running the AI factory, because it owns the identities, the access paths, and the logging controls. NIST-style zoning and strong authentication help define responsibility, but they do not create it. If a service account, admin account, or control-plane credential is misused, the missing control is usually lifecycle ownership and auditable privilege boundaries.

Technical breakdown

Zone-based access in AI factories

NIST’s HPC security overlay treats the AI factory as a zoned environment, separating access, management, compute, and storage to reduce blast radius. This matters because AI pipelines are not a single workload path. Data ingest, model training, orchestration, and storage access each introduce different trust requirements. Role-based access control limits which identities can cross zone boundaries, while least privilege keeps operational accounts from inheriting unnecessary permissions. The security value is not only segmentation, but also making privilege auditable by zone rather than by vague job function.

Practical implication: Map every AI factory identity to a specific zone and deny cross-zone access by default.

Service accounts and AI model control planes

AI factories often depend on service accounts for schedulers, data movers, microservices, and model control plane components. These identities are persistent machine credentials, not human user accounts, and they can quietly accumulate privilege if provisioning and offboarding are not centralised. When the article references MCP servers, the real governance concern is the same: machine identities that can reach sensitive systems without a clear lifecycle owner. Rotation, key distribution, and ownership tracking are what stop these accounts from becoming invisible long-term access paths.

Practical implication: Treat every service account in the AI stack as a lifecycle-managed identity with an owner, expiry, and rotation path.

Audit logging for performance-constrained environments

High-performance clusters often discourage deep logging because teams fear overhead, but AI factory governance depends on traceability. NIST’s guidance therefore prioritises audit logs that capture who accessed training data, who changed configuration, and which process initiated unusual transfers without overwhelming the environment. The technical challenge is selective fidelity: enough session, command, and authentication evidence to reconstruct activity, but tuned to the operational realities of HPC. Without that, anomaly detection and forensic review both become guesswork.

Practical implication: Preserve administrative and access logs at the control points that matter most, even if less critical telemetry is sampled.

Threat narrative

Attacker objective: The attacker wants to use trusted AI factory identities to reach sensitive data, alter workloads, or operate inside the environment without triggering normal oversight.

Entry occurs through a legitimate AI factory identity path, such as a service account, administrative account, or unmanaged scheduler credential that can reach clustered resources.
Escalation follows when that identity has excess privilege, allowing access to model pipelines, storage, or management zones that should have remained separate.
Impact is achieved through unauthorised data access, shadow model deployment, or abuse of high-performance resources that creates compliance and insider-risk exposure.

Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.
CI/CD pipeline exploitation case study — full server takeover via exposed .git directory and mismanaged CI/CD pipeline secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI factories expose identity sprawl as an infrastructure design flaw, not just an access issue. When AI moves in-house, the number of accounts, service principals, and elevated operational paths grows faster than traditional governance processes can inventory. That makes the identity layer the control plane for the whole factory. Practitioners should treat unmanaged machine identities as a design defect, not a cleanup task.

Service accounts in AI factories are the new long-tail privilege risk. These identities are created for orchestration, data movement, and control-plane tasks, but they often outlive the project that introduced them. The result is standing access that no one actively reviews because no one believes the account is still important. The practitioner conclusion is simple: lifecycle ownership must be explicit or privilege becomes permanent by default.

Identity-centric control in high-performance AI is the difference between acceleration and blind trust. NIST’s zone model, strong authentication, and audit guidance work because they force AI activity into observable boundaries. That is the governance baseline for AI factories, and it should be viewed through OWASP-NHI and NIST CSF rather than as a generic infrastructure hardening exercise. Teams should align AI governance with identity governance from the start.

Shadow AI is really shadow identity. Unvetted models and ad hoc training jobs become a security problem because they bring their own accounts, permissions, and data paths. The AI factory environment can therefore look compliant at the platform level while still accumulating untracked access at the workload level. Practitioners should assume that any unmanaged AI deployment is also an unmanaged identity deployment.

Zone-based privilege is a better named concept for AI factories than broad least privilege. In these environments, least privilege only becomes operational when it is tied to the access, management, compute, and storage zones that define the factory. That framing is more useful than generic IAM language because it matches how AI work actually moves across the stack. Teams should govern privilege by zone, not by platform optimism.

From our research:
97% of NHIs carry excessive privileges, increasing unauthorised access and broadening the attack surface, according to the Ultimate Guide to NHIs.
71% of NHIs are not rotated within recommended time frames, increasing the risk of compromise over time, according to the Ultimate Guide to NHIs.
For lifecycle governance depth, see Ultimate Guide to NHIs - Lifecycle Processes for Managing NHIs for provisioning, rotation, and offboarding patterns.

What this signals

Shadow AI becomes shadow identity as soon as teams allow ad hoc training jobs and unmanaged service accounts to proliferate. The governance response is to classify every AI factory credential as a lifecycle-managed identity, not as a local platform exception. That is especially important when identity sprawl meets high-performance clusters, where visibility gaps compound quickly and the audit trail becomes the only reliable source of truth.

With only 5.7% of organisations reporting full visibility into service accounts, per the Ultimate Guide to NHIs, AI factory programmes should assume that untracked machine identities already exist. The practical implication is that inventory, ownership, and rotation need to be built into the operating model before the first large-scale model rollout, not after an incident.

AI factory teams should also expect security architecture to converge with identity architecture. The more the environment depends on schedulers, storage gateways, and control-plane accounts, the more zone-based access and privileged session control become part of AI governance rather than a separate IAM backlog. NIST-aligned controls only work when the identity layer is designed into the factory from the start.

For practitioners

Inventory every machine identity in the AI factory Build a register for service accounts, scheduler identities, AI microservices, and control-plane credentials. Assign an owner, purpose, and expiry to each identity so no account exists outside a lifecycle process.
Enforce zone-specific privilege boundaries Map access, management, compute, and storage zones to separate roles and deny direct cross-zone access unless the request is explicitly approved and logged. Use separate admin and operator accounts so training jobs never inherit root-like rights.
Rotate keys and principals for automated services Automate credential rotation for Kerberos principals, keytabs, and service credentials used by AI pipelines. Tie rotation to change control so every credential has a visible renewal path and a revocation point.
Record the actions that matter for investigations Capture admin sessions, command histories, authentication events, and unusual data transfers at the control points that govern AI factory access. Keep the logs usable for forensic review and compliance without flooding the cluster with low-value noise.

Key takeaways

AI factories make identity governance part of core infrastructure design because service accounts, admin paths, and orchestration identities now control access to the most sensitive parts of the environment.
The scale of the problem is visible in enterprise NHI data, where excessive privilege and limited service account visibility remain the norm rather than the exception.
Organisations should implement zone-based privilege, lifecycle-managed machine identities, and auditable access paths before AI factories become entrenched.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	The post centres on rotation and lifecycle control for machine identities.
NIST CSF 2.0	PR.AC-4	Zone-based access and least privilege are central to the article's control model.
NIST Zero Trust (SP 800-207)	PR.AC	The article uses zone segmentation and continuous access control principles.

Apply zero trust segmentation so AI workloads can only reach approved resources through governed paths.

Key terms

AI factory: An AI factory is an on-premise or hybrid computing environment built to train, deploy, and operate AI at scale. It combines GPU clusters, storage, orchestration, and identity controls into a single production system where access management and workload governance are tightly coupled.
Service account: A service account is a non-human identity used by software, schedulers, or automation to access systems without a person logging in. In AI factories, these accounts often control pipelines and infrastructure tasks, so they need ownership, rotation, and explicit lifecycle governance.
Zone-based access control: Zone-based access control divides an environment into separate trust zones, such as access, management, compute, and storage. In AI factories, the goal is to prevent any single identity from moving freely across the stack and to make privilege easier to audit and contain.
Shadow AI: Shadow AI is AI development or deployment that occurs outside approved governance and security oversight. In practice, it often appears as untracked models, ad hoc training jobs, or unmanaged service identities that bypass the controls the organisation expects to protect them.

Deepen your knowledge

NHI governance, machine identity security, and identity lifecycle management are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building or maturing an IAM programme, it is worth exploring.

This post draws on content published by Delinea: Secure AI factories with NIST HPC guidelines, NVIDIA architecture, and Delinea controls. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-08.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org