Infrastructure is not stateless, and CI/CD breaks down

By NHI Mgmt Group Editorial TeamPublished 2025-07-07Domain: Best PracticesSource: ControlMonkey

TL;DR: Traditional CI/CD is effective for stateless application delivery, but it breaks down for live cloud infrastructure where state, dependencies, drift, and rollback risk make small changes harder to control, according to ControlMonkey. The governance problem is not pipeline speed, but whether teams can safely manage infrastructure changes with traceability and policy.

At a glance

What this is: This is an analysis of why CI/CD alone is a poor delivery model for cloud infrastructure, with the key finding that infrastructure is stateful, interconnected, and far harder to rollback safely than application code.

Why it matters: It matters because IAM, NHI, and cloud governance teams need delivery controls that preserve ownership, drift visibility, and policy enforcement when infrastructure changes can affect access, compliance, and availability.

👉 Read ControlMonkey's analysis of why CI/CD breaks down for cloud infrastructure

Context

Cloud infrastructure behaves differently from application code because it is live, stateful, and tightly coupled to access, policy, and routing decisions. A change that looks routine in a deployment pipeline can create drift, expose permissions, or break production in ways software release practices were never designed to contain.

For IAM, NHI, and cloud security teams, the real issue is governance at the infrastructure layer. The article argues that teams need a delivery model that tracks ownership, drift, and compliance per stack rather than assuming generic CI/CD controls are enough.

Stack governance: the article's core concept is a governed, trackable infrastructure unit that connects code to live cloud resources with history, compliance, and ownership. That framing matters because the control problem is not just deployment, but knowing what changed, who owns it, and whether the live state still matches intent.

Key questions

Q: How should teams govern infrastructure changes when CI/CD is not enough?

A: Use a delivery model that treats infrastructure as a governed stateful asset, not as disposable application code. Require ownership, drift checks, policy validation, and traceable approvals before changes reach production. The goal is to make every change explainable, reversible only where safe, and auditable across teams.

Q: Why do cloud infrastructure changes create more risk than software deployments?

A: Cloud infrastructure changes can alter live access paths, routing, and compliance state immediately, so the impact is broader than a code artifact swap. Rollback is slower and less clean because the environment may already have changed in production. That makes pre-change control more important than post-change recovery.

Q: What should security teams measure to know whether infra delivery is under control?

A: Measure drift frequency, unowned resources, policy exceptions, and the time it takes to explain a live change from code to production. If teams cannot trace resource ownership or identify divergence quickly, the delivery model is operating with weak governance even if pipelines are green.

Q: Who should own governance when infrastructure delivery spans engineering and security?

A: Engineering should own the code path, while security and platform teams should own the policy boundaries and audit expectations. The important point is that ownership must be explicit at the stack level, because unclear accountability is what allows drift and unsafe change to accumulate.

Technical breakdown

Why CI/CD fits software but not cloud infrastructure

CI/CD assumes a deployable unit can be replaced, rolled back, and validated without long-lived side effects. Infrastructure is different because security groups, IAM policies, route tables, and network dependencies are stateful and interconnected. A bad change does not remain isolated inside a build artifact. It can alter access paths, break service dependencies, or create compliance exposure immediately. The article's key technical point is that infrastructure delivery needs state-aware governance, not just release automation.

Practical implication: treat infrastructure changes as governed state transitions, not ordinary software releases.

Drift, ownership, and policy enforcement in stack-based delivery

A stack model ties code to live cloud resources and adds visibility into what is managed, who owns it, and whether drift exists. Drift matters because live infrastructure can diverge from declared configuration through manual change, partial rollout, or environment-specific exceptions. Policy enforcement at the stack layer lets teams validate changes before they reach production and keep a paper trail of approvals and outcomes. This is a control-plane problem as much as a delivery problem.

Practical implication: require ownership, drift detection, and policy checks before infrastructure changes are allowed to proceed.

Why infrastructure rollback is not the same as code rollback

Software rollback usually restores a prior artifact with limited residual impact. Infrastructure rollback is slower and riskier because the failed change may already have altered access, connectivity, or compliance state across several dependent services. That means the damage can outlive the deployment event. The article is pointing to a fundamental operational mismatch: infrastructure changes need pre-change validation, controlled execution, and traceable remediation because post-change recovery is not clean or instant.

Practical implication: build pre-deployment guardrails and explicit recovery procedures for infra changes instead of relying on rollback alone.

NHI Mgmt Group analysis

CI/CD assumes infrastructure is disposable, and that assumption fails the moment state matters. Application delivery can often tolerate rollback as a normal safety valve. Cloud infrastructure cannot, because permissions, routes, and dependencies already exist in production when a change lands. The implication is that infrastructure governance must be built around live state, not artifact replacement.

Stack-level governance is the right abstraction for cloud change control. The article's strongest contribution is the idea that teams need a governed delivery unit that binds code, resources, ownership, drift, and compliance together. That is materially different from using pipelines as a generic transport layer. Practitioners should treat the stack as the unit of accountability, not the individual commit.

Infrastructure delivery failures are usually governance failures before they are engineering failures. Manual approvals, invisible drift, and undocumented ownership all create the conditions for risky change propagation. The control gap is not lack of velocity, but lack of authoritative state. Practitioners should reframe infra delivery as a policy and accountability problem first.

Cloud growth turns small delivery mistakes into compound control loss. As estates expand across teams, regions, and accounts, the cost of unclear ownership and uncontrolled change rises faster than the rate of deployment. That makes visibility into live state a baseline security requirement, not an operational luxury. Practitioners need a delivery model that scales governance with the environment.

Identity and infrastructure change are now coupled, so delivery models must reflect that coupling. A mis-scoped infrastructure change can alter who can access what just as easily as it can alter availability. That means cloud delivery and identity governance can no longer be treated as separate operational lanes. Practitioners should align change management, access control, and infrastructure state tracking in one control model.

From our research:
67% of organisations still rely heavily on static credentials despite the risks they pose to agentic AI deployments, according to the 2026 Infrastructure Identity Survey.
84% of organisations report at least one unresolved identity governance gap across machine or AI workloads, according to the Ultimate Guide to NHIs , Standards.
If you are reworking delivery control for live infrastructure, start by linking ownership, access, and drift in the same operating model, then compare that approach with the NIST Cybersecurity Framework 2.0.

What this signals

Infrastructure delivery is moving from a build-and-ship problem to a governed-state problem. For identity and cloud teams, that means change control, access control, and configuration control increasingly need to operate as one system rather than separate review queues.

State-aware delivery: a governed stack is only useful if it becomes the unit of policy, ownership, and audit. The practical signal for practitioners is that infra programs must now prove live-state visibility, not just deployment speed.

With 67% of organisations still relying heavily on static credentials despite the risks they pose to agentic AI deployments, according to the 2026 Infrastructure Identity Survey, any delivery model that cannot account for identity state is already behind the operational curve.

For practitioners

Map infrastructure to accountable stack units Define a governed stack for each live infrastructure boundary so code, ownership, drift state, and compliance status are visible together. That gives change control a stable unit of review instead of a loose pipeline path.
Block changes when live state no longer matches intent Add drift detection to release gates so teams stop shipping against an unknown environment. If a resource has diverged from code, require remediation or explicit approval before the next change proceeds.
Replace manual approvals with policy-backed reviews Move approval logic into policy checks that can evaluate environment, ownership, and risk before deployment. That reduces dependency on human memory while preserving an auditable decision trail.
Trace every resource back to code ownership Require each cloud resource to have an explicit code owner and change history. If leadership asks what changed, the answer should come from the system, not from a scramble across teams.

Key takeaways

CI/CD is a strong software delivery model, but it is the wrong default for live cloud infrastructure because state, dependency, and rollback risk behave differently.
The article's core governance idea is the stack, a delivery unit that binds code, ownership, drift, compliance, and live resource state into one control surface.
Practitioners should respond by making infrastructure changes traceable, policy-backed, and drift-aware before the next deployment reaches production.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	PR.IP-1	Infrastructure delivery needs policy-backed change control and traceable state.
NIST Zero Trust (SP 800-207)	PR.AC-4	Live infrastructure changes can alter access paths and privilege boundaries.
OWASP Non-Human Identity Top 10	NHI-03	Cloud delivery often depends on secrets and machine credentials in pipelines.

Apply least-privilege checks to infra changes and verify access boundaries before deployment.

Key terms

Stateful Infrastructure: Infrastructure that retains live operational state after a change is applied. Unlike app artifacts that can often be replaced cleanly, stateful infrastructure carries dependencies, access paths, and compliance effects that make rollback and remediation more complex.
Drift: A mismatch between declared configuration and the live environment. In infrastructure governance, drift is not just a configuration nuisance. It is evidence that the system may no longer match the approved security, access, or operational intent of the code.
Stack: A governed infrastructure unit that binds code, ownership, live resources, policy, and history together. The concept gives teams a practical boundary for review and accountability so change control can follow the real shape of the environment.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by ControlMonkey: Software Is Stateless. Infrastructure Is Not. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-07.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org