By NHI Mgmt Group Editorial TeamPublished 2026-01-14Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: Open source models are closing the capability gap with closed systems and pushing more organisations toward in-house inference as monthly spend rises into five figures, according to WorkOS's interview with Baseten. The governance question is no longer whether AI workloads will scale, but which identity, access, and infrastructure controls will govern them when they do.


At a glance

What this is: This interview argues that open source models are narrowing the gap with closed models and driving a shift toward more customised, in-house AI inference infrastructure.

Why it matters: That matters because once AI workloads move from experimentation into operational systems, IAM teams have to govern model access, workload identities, and platform boundaries with the same discipline they apply to other critical machine identities.

👉 Read WorkOS's interview on Baseten's open source model and inference shift


Context

Open source models are changing the economics of AI infrastructure because capability gaps are shrinking fast enough that organisations can justify more control over the stack. In practical identity terms, that shift moves AI from a procurement choice into an access-governance problem, where workload identity, permissions, and operational accountability become part of the design.

The article frames this as a scaling pattern: early-stage teams optimise for product-market fit, then later pull workloads in-house as costs rise and differentiation matters more. For IAM and security teams, that is the point where model-serving environments start to look like any other high-value non-human workload and need explicit governance rather than informal platform trust.


Key questions

Q: How should security teams govern in-house AI inference workloads?

A: Security teams should govern in-house AI inference workloads as non-human identities with scoped permissions, named ownership, and lifecycle controls. That means inventorying service accounts, separating deployment rights from infrastructure administration, and reviewing credentials whenever models move between environments or vendors. If the workload can call other systems, it needs a defined trust boundary.

Q: Why do open source models increase identity governance pressure?

A: Open source models increase identity governance pressure because they make it easier to bring AI execution inside the enterprise boundary. Once that happens, the organisation owns the permissions, credentials, and operational changes that keep the workload running. The governance challenge shifts from vendor reliance to managing internal access paths and accountability.

Q: What breaks when AI workloads scale without lifecycle controls?

A: When AI workloads scale without lifecycle controls, old credentials and broad privileges tend to remain in place after the system changes. That creates orphaned access, unclear ownership, and excessive runtime authority across deployment, observability, and integration layers. The result is a machine identity estate that grows faster than the controls that govern it.

Q: Should teams treat model-serving platforms like privileged infrastructure?

A: Yes. Model-serving platforms often sit on top of GPU clusters, cloud services, and internal data paths, which means they can reach sensitive systems even when the model itself looks isolated. Treating them as privileged infrastructure forces change control, least privilege, and monitoring to apply to the full execution path, not just the model endpoint.


Technical breakdown

Open source model parity and infrastructure control

Open source AI models matter because they reduce dependence on a single model provider and give organisations more options for performance, cost, and customisation. That does not remove governance requirements. It changes where control sits: the identity boundary shifts from a vendor API to the enterprise’s own inference environment, where platform access, deployment permissions, and service-to-service trust all need to be explicit. In practice, that makes model serving another workload identity domain, not a special AI exception.

Practical implication: treat model-serving infrastructure as a governed workload with scoped access, not as a loosely managed platform extension.

Inference spend as a trigger for identity control

The article’s cost curve is the real signal. Once inference spend reaches levels that justify bringing workloads in-house, organisations stop renting model access and start owning more of the execution environment. That changes the identity problem from authenticating to a third-party endpoint into governing the internal actors that deploy, call, and monitor models. The control challenge is not just secrets or tokens, but who can change routing, swap models, or expand runtime permissions inside the stack.

Practical implication: map who can modify inference infrastructure, because those permissions become the effective control plane for AI operations.

Why AI workload governance follows NHI patterns

AI inference systems behave like non-human identities because they operate continuously, call downstream services, and rely on credentials to function at runtime. That puts them inside the same governance family as other machine identities: service accounts, API keys, certificates, and workload tokens. The article’s open source trend does not create a new identity category. It increases the number of AI workloads that need lifecycle management, least privilege, and boundary enforcement across the environments they touch.

Practical implication: extend NHI governance to AI inference stacks so their credentials, entitlements, and offboarding are managed like any other machine identity.


Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

Open source model adoption is turning AI infrastructure into an NHI governance problem. Once organisations run models on their own GPUs and internal platforms, the relevant question is no longer which model they buy but which non-human identities can deploy, call, and modify those systems. That shifts AI from a usage decision into an access and accountability problem, and practitioners need to govern the runtime estate accordingly.

Inference economics create a control inflection point, not just a budgeting one. The article shows that teams move in-house when monthly spend becomes material, which is also the point at which unmanaged permissions start to matter more than raw model capability. The governance lesson is that cost pressure often triggers decentralised platform ownership, and that is where identity sprawl begins unless access boundaries are made explicit.

Identity does not stop at the model endpoint. The control surface extends into orchestration, deployment, observability, and the service accounts that let the system operate. When those layers are treated as generic infrastructure, AI workloads inherit broad runtime authority without a clear lifecycle model. Practitioners should read this as a machine identity programme issue, not a model-choice debate.

Named concept: inference identity drift. The article illustrates how AI systems move from tightly scoped experimentation into broader operational use without a corresponding reset in permissions, ownership, or offboarding. That drift is especially dangerous because the workload appears familiar while its trust boundaries have silently expanded. The implication is that AI infrastructure can outgrow the assumptions that originally governed it.

Open source progress accelerates governance divergence across enterprises. The companies that bring inference in-house will accumulate operational control faster than the teams that remain dependent on external APIs. That creates a split between organisations that can enforce workload identity discipline and those that rely on platform abstraction. Practitioners should treat this as a maturity gap that will surface in audit, resilience, and access review conversations.

From our research:

  • The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
  • Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
  • For a broader view of machine identity exposure, see 230M AWS environment compromise for how exposed credentials scale into operational risk.

What this signals

Inference identity drift: as AI workloads move from vendor APIs into in-house environments, the governance surface expands from model consumption to runtime control. Teams should expect more privileged service accounts, more cross-system permissions, and more pressure to fold AI platforms into standard NHI lifecycle processes. The right comparison is not model versus model, but governed workload versus unmanaged workload.

The operational signal is that model choice is now intertwined with access architecture. If an organisation can switch to open source models and internal hosting when spend increases, then IAM and platform teams need to be ready for access reviews, offboarding, and change control to cover GPU clusters, orchestrators, and service accounts as a single trust chain. That is where the next control gap appears, and it is already visible in NIST Cybersecurity Framework 2.0 terms of govern, protect, and respond.


For practitioners

  • Map AI inference workloads to workload identities Inventory every model-serving environment, the service accounts that operate it, and the credentials used for deployment, inference, logging, and telemetry. If an AI system can reach other systems, it needs a named identity owner and a defined access boundary.
  • Separate model access from platform administration Split permissions so the teams that tune prompts, models, or routing cannot also grant themselves broad infrastructure access. The goal is to keep inference change control distinct from cluster administration and cloud-level privilege.
  • Review in-house AI workloads as part of NHI lifecycle governance Include AI inference environments in joiner-mover-leaver, access review, and offboarding processes. When models move between vendors, accounts, or environments, revoke stale tokens and revalidate runtime trust paths.
  • Define escalation paths for model-serving changes Require approvals for changes that alter model endpoints, GPU pools, API routing, or downstream service reach. Those changes can silently expand the blast radius of the workload and should be treated as privileged actions.

Key takeaways

  • Open source model adoption is shifting AI governance from external dependency management to internal workload identity control.
  • Once inference spend justifies in-house hosting, permissions around deployment, routing, and runtime access become the real control plane.
  • IAM teams should extend NHI lifecycle governance to AI infrastructure before model-serving environments accumulate unmanaged privilege.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Non-Human Identity Top 10NHI-01Inference platforms depend on non-human credentials and scoped runtime access.
NIST CSF 2.0PR.AC-4Access control governs who can alter the AI workload and its trust boundaries.
NIST Zero Trust (SP 800-207)Zero trust applies to internal AI services that call other systems and data sources.

Treat model-serving components as continuously verified services with explicit trust decisions at each boundary.


Key terms

  • Inference Infrastructure: The computing, orchestration, and access layer that runs AI models in production. It includes GPUs, deployment systems, routing, telemetry, and the credentials that let the workload operate. In identity terms, it is a governed execution environment, not just a hosting layer.
  • Workload Identity: A machine identity assigned to a software service so it can authenticate and act in production. It usually includes service accounts, tokens, certificates, or federated credentials. For AI systems, workload identity determines what the model-serving environment can reach and modify at runtime.
  • Inference Identity Drift: The gradual expansion of permissions, ownership, and trust around an AI workload as it moves from test use to production use. The system still looks like the same model, but its access boundaries have widened. That creates hidden exposure when governance does not reset alongside operational growth.
  • Model-Serving Platform: The infrastructure layer that exposes a trained model to applications and users. It often includes routing, scaling, monitoring, and integration services, which means it can carry privileged access to other systems. Security teams should treat it as part of the identity perimeter, not a neutral compute service.

Deepen your knowledge

AI inference governance and workload identity are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for in-house model hosting or model-serving platforms, it is worth exploring.

This post draws on content published by WorkOS: Baseten is betting big on open source models. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-01-14.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org