Modal shows why AI infrastructure needs compute-native identity controls

By NHI Mgmt Group Editorial TeamPublished 2026-01-14Domain: Workload IdentitySource: WorkOS

TL;DR: Modal’s interview argues that AI applications need infrastructure abstractions built for bursty GPU workloads, rapid container startup, and global scale, because traditional Kubernetes and cloud patterns make inference harder than necessary, according to WorkOS. The identity lesson is that compute elasticity changes control boundaries, so governance must follow workload behaviour rather than static infrastructure assumptions.

At a glance

What this is: This is a WorkOS interview about Modal’s compute-native AI infrastructure thesis, with the key finding that traditional infrastructure patterns do not fit bursty, GPU-heavy AI workloads well.

Why it matters: It matters because IAM and NHI teams have to govern workloads, tokens, and orchestration paths that scale and move faster than the controls built for slower, more static systems.

👉 Read WorkOS's interview on Modal's compute-native AI infrastructure thesis

Context

AI infrastructure creates an identity problem when workloads scale up and down dynamically, because the trust boundary is no longer tied to a fixed host, cluster, or deployment pattern. In this case, the primary issue is not human access, but the way non-human workloads consume GPU capacity, orchestration, and cloud credentials at speed.

Modal’s model reflects a broader shift in NHI governance: compute-intensive applications now depend on short-lived runtime contexts, scheduling layers, and container isolation rather than stable infrastructure blocks. That changes how teams think about workload identity, privilege scope, and access visibility across cloud environments.

Key questions

Q: How should security teams govern bursty AI workloads in cloud environments?

A: Security teams should govern bursty AI workloads by tying access, logging, and revocation to the job lifecycle rather than to the underlying host or cluster. When workloads scale quickly, the useful control point is the runtime session. That keeps privilege aligned to purpose and reduces the chance that fast automation turns into standing access.

Q: Why do AI infrastructure platforms create new identity governance risks?

A: AI infrastructure platforms create new identity governance risks because they hide orchestration complexity while concentrating trust in the layer that schedules work and attaches permissions. If identity teams cannot see who requested compute, what permissions were attached, and when they were removed, least privilege becomes hard to verify in practice.

Q: What breaks when workload identity is managed like a static server identity?

A: When workload identity is managed like a static server identity, access reviews, approval cycles, and rotation assumptions all lag behind actual execution. Bursty AI jobs can appear and disappear before those controls see them, which means the real privilege window is much shorter and much harder to govern with traditional review processes.

Q: How do platform teams and IAM teams split responsibility for AI compute governance?

A: Platform teams should own scheduling, isolation, and runtime telemetry, while IAM teams should own entitlement scope, revocation, and credential provenance. The split only works if both groups share the same view of compute as an identity event, not just an infrastructure event. That is the practical way to keep governance close to execution.

Technical breakdown

Compute-native workload identity for bursty GPU applications

Bursty AI workloads stress identity and infrastructure in ways that traditional cloud patterns were not designed to absorb. When inference demand spikes, the platform must provision capacity, start containers quickly, and attach the right permissions without creating standing access that outlives the job. In practice, the identity question is not just whether a workload can run, but how its permissions are expressed when execution is transient and geographically distributed. That makes workload identity, session scope, and orchestration trust part of the same control plane.

Practical implication: treat GPU workloads as short-lived identities and map permissions to runtime context, not to a static environment.

Orchestration abstraction and the hidden control plane

When a platform replaces Kubernetes and Docker for developers, it also absorbs scheduling, isolation, and execution orchestration into a higher-level control plane. That simplification can reduce operational burden, but it also concentrates trust in the abstraction layer that decides where code runs and what it can touch. GVisor adds workload isolation at runtime, yet isolation alone does not answer how credentials, service-to-service calls, and data access are governed across many ephemeral jobs. The hidden risk is that abstraction can make access paths less visible even as they become more dynamic.

Practical implication: inventory which identity decisions moved into the platform abstraction and require explicit review.

Elastic inference changes privilege and observability assumptions

Inference traffic is unpredictable, so platforms optimize for elasticity, fast container loading, and rapid GPU allocation. That is operationally useful, but it breaks older assumptions that access can be planned around stable capacity, fixed hosts, or predictable schedules. In AI infrastructure, identity and security tooling must handle a moving target: jobs appear, expand, and disappear before traditional review cycles can observe them. The result is not just scale, but control compression, where fewer minutes of runtime carry more effective privilege than longer-lived systems ever did.

Practical implication: base monitoring, approval, and logging on job lifecycle events rather than on server lifecycle events.

230M AWS environment compromise — 230M AWS environments compromised via exposed .env files with cloud credentials.
MongoBleed breach — MongoBleed exposed secrets across 87K MongoDB servers.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Compute elasticity is now an identity design constraint, not just an infrastructure preference. AI platforms that can spin up thousands of GPUs within minutes force identity teams to govern access around runtime behaviour rather than fixed assets. Traditional IAM assumptions about stable hosts, long-lived sessions, and predictable scheduling become weaker as workload timing becomes the primary control variable. Practitioners should treat elasticity as a governance boundary, not only a performance feature.

Workload abstraction concentrates trust in the orchestration layer. When a platform hides Kubernetes, Docker, and cloud complexity from developers, it also centralises decisions about placement, isolation, and execution timing. That means the trust model shifts upward into the service that interprets workload intent and allocates compute. The practical implication is that identity teams need visibility into who or what can request compute, under what conditions, and with which permissions attached.

Runtime privilege drift: This article highlights a control problem where AI jobs inherit more effective access than their original purpose requires because capacity and orchestration are optimised for speed, not least privilege. That is a familiar NHI pattern, but AI infrastructure makes it more visible because workloads are launched and retired quickly. The governance lesson is that privilege scope must be defined at the workload boundary, or it will expand through operational convenience.

AI infrastructure is pushing NHI governance closer to platform engineering and away from static IAM review cycles. When compute demand is bursty and geographically distributed, access decisions happen inside infrastructure automation rather than in separate approval workflows. That does not eliminate identity governance, but it moves it into the runtime path where delays are costly and visibility is fragmented. Practitioners should expect NHI controls to be embedded in orchestration, not layered on afterwards.

Agentic and non-agentic workloads will converge on the same governance problem: who can cause compute to happen. Even where the article is about infrastructure rather than autonomous AI, the underlying question is the same one IAM teams now face across AI programmes. If the system can scale resources, launch jobs, and attach permissions dynamically, then governance must focus on request authority, execution context, and post-execution revocation. That is the direction the market is moving, and teams should prepare for it now.

From our research:
53% of security leaders expect AI to run major portions of their infrastructure autonomously within the next three years, according to The 2026 Infrastructure Identity Survey.
Another finding from the same survey shows that 67% of organisations still rely heavily on static credentials despite the risks they pose to agentic AI deployments.
For a broader view of where AI identity governance is heading, see Ultimate Guide to NHIs , 2025 Outlook and Predictions.

What this signals

Runtime governance will become the differentiator for AI infrastructure programmes. As GPU workloads and inference jobs become more dynamic, security teams will need control points that understand job start, scaling, and teardown rather than only server inventory. The operational question is no longer whether the platform is fast enough, but whether identity controls can keep pace with it.

Elastic compute changes the meaning of least privilege. A workload that can request capacity in seconds does not fit a review model built for long-lived infrastructure. Teams should expect privilege scoping to move closer to orchestration and telemetry, with runtime events becoming the basis for access decisions and anomaly detection.

With 53% of security leaders expecting AI to run major portions of their infrastructure autonomously within three years, according to the 2026 Infrastructure Identity Survey, the governance gap is already structural. Teams that treat AI compute as ordinary infrastructure will miss where permissions are actually created and exercised.

For practitioners

Map workload identity to job lifecycle events Tie permissions, logging, and revocation to container start, model load, and job termination so access does not persist beyond the compute session.
Review orchestration-layer trust boundaries Identify which identity decisions now happen in the platform abstraction, including scheduling, placement, and permission attachment, and document the controls around each step.
Limit effective privilege for bursty AI workloads Scope cloud roles and service credentials to the smallest possible job purpose, then verify that fast scale-out does not silently widen access.
Instrument runtime visibility before scaling AI production Collect events for GPU allocation, container launch, and service-to-service calls so teams can detect unexpected access paths during short-lived inference runs.

Key takeaways

AI infrastructure is becoming an identity governance problem because bursty workloads now consume privileged compute at runtime, not through stable server patterns.
The control challenge is not just scale, but visibility into the orchestration layer that attaches permissions and launches jobs.
Security teams should move entitlement, logging, and revocation to the workload lifecycle if they want least privilege to hold in AI production.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Bursty AI workloads can outlive their intended access window if credentials are not tied to runtime.
NIST CSF 2.0	PR.AC-4	Access should stay least-privileged even when orchestration scales jobs dynamically.
NIST Zero Trust (SP 800-207)	AC-4	Dynamic AI workloads need policy enforcement at the point of access, not only at deployment.

Map AI workload entitlements to least-privilege access and review them at each release cycle.

Key terms

Workload Identity: A workload identity is the non-human identity attached to a running application, container, job, or service so it can authenticate and obtain access. In AI infrastructure, that identity often exists only for the duration of a job and must be governed as a runtime asset, not a static server account.
Orchestration Layer: The orchestration layer is the control plane that schedules workloads, allocates compute, and attaches runtime permissions. For AI systems, it becomes an identity-critical boundary because it decides where code runs and what access accompanies each execution event.
Runtime Privilege: Runtime privilege is the access a workload can exercise while it is actively executing. It is narrower than broad environment entitlement, but it can still be excessive if the platform attaches more permissions than the job needs or keeps them active after the task finishes.

Deepen your knowledge

AI workload identity and runtime governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for bursty GPU workloads or AI infrastructure abstraction, it is worth exploring.

This post draws on content published by WorkOS: Modal is building AI infrastructure that doesn't get in the way. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-01-14.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org