What should teams do before moving AI workloads into production?

Why This Matters for Security Teams

Moving AI workloads into production changes the risk profile from model experimentation to operational authority. The hard part is not the model alone, but the identity chain around it: service accounts, deployment tools, GPU schedulers, secret stores, and the humans approving changes. If those layers share broad access or unclear ownership, the workload can inherit privileges that were never meant for autonomous execution. Guidance from Ultimate Guide to NHIs — What are Non-Human Identities is useful here because production AI often behaves more like a workload than a user, which means identity controls need to be designed around runtime actions, not job titles. For workload identity patterns, the SPIFFE workload identity specification gives teams a concrete foundation for proving what a service is before it touches infrastructure.

The practical mistake is to treat “go-live” as a deployment milestone instead of an access-control milestone. Security teams should verify that model serving can be isolated from training, that GPU provisioning is separately governed, and that supporting systems such as observability, data access, and CI/CD do not collapse into one shared trust zone. In practice, many security teams encounter privilege sprawl only after an AI release has already created hidden dependencies, rather than through intentional design.

How It Works in Practice

Before production, teams should map the full chain of non-human identities involved in the workload: the model runtime, orchestration layer, pipeline agents, secret brokers, and any automation that can change infrastructure. That inventory should be tied to ownership and approval paths, because production controls fail when no one can say who may request, approve, or revoke access. The SailPoint Critical Gaps in Machine Identity Management report is relevant because it shows how visibility gaps and incomplete inventories undermine control at scale.

A practical production checklist usually includes:

Separate RBAC roles for model serving, GPU provisioning, and platform administration.

Short-lived credentials for deployment automation, with JIT issuance where possible.

Workload identity for services and agents, rather than shared secrets copied into pipelines.

Logging that preserves who approved access, what was deployed, and which identity executed the change.

Policy checks at request time so the workload is evaluated against current context, not just pre-approved templates.

For agentic systems, current guidance suggests going further and using intent-based authorisation, because an AI agent may chain tools in ways a static role model never anticipated. The Ultimate Guide to NHIs — Standards and SPIFFE patterns both support this shift toward verifiable workload identity. The Guide to SPIFFE and SPIRE is especially useful when teams want cryptographic identity for workloads without relying on long-lived static secrets.

These controls tend to break down when production AI spans multiple clouds or shared Kubernetes clusters because identity boundaries and approval chains become fragmented.

Common Variations and Edge Cases

Tighter pre-production controls often increase delivery overhead, so organisations have to balance faster rollout against stronger separation of duties and traceability. That tradeoff becomes sharper for AI systems that depend on frequent model refreshes, burstable GPU capacity, or multiple vendors in the same deployment chain. There is no universal standard for this yet, but best practice is evolving toward short-lived secrets, workload identity, and runtime policy evaluation rather than static access grants.

Edge cases appear when an AI workload is not fully autonomous but still has enough tool access to modify data, call APIs, or trigger infrastructure changes. In those environments, the line between “application” and “agent” blurs, so governance should assume goal-driven behaviour even if the system is not marketed as an AI agent. For that reason, NHI identity models and DeepSeek breach lessons are both relevant: secret exposure and weak visibility turn a deployment into an access problem fast.

Where production uses shared secrets, long-lived service tokens, or broad platform-admin roles, the recommendation to “separate access controls” becomes difficult to enforce consistently. In those environments, the safest move is to reduce the agent or workload’s standing authority first, then expand only by exception.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Production AI often depends on long-lived machine creds and poor rotation.
CSA MAESTRO		MAESTRO addresses runtime governance for autonomous agent behaviour.
NIST AI RMF		AI RMF covers accountability, governance, and risk controls for AI deployment.

Define runtime policy, approval, and observability controls before any agent reaches production.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What should teams do before moving AI workloads into production?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group