They should prebuild the common case and reserve on-demand execution for the long tail that the fleet actually encounters. That approach reduces wasted CI time while still preserving readiness for unusual kernels. The decision signal is real runtime demand, not how many possible variants exist in theory.
Why This Matters for Security Teams
Platform teams are not deciding between “fast” and “safe” in the abstract. They are deciding where engineering capacity should be spent so the fleet can move quickly without turning every request into a fresh build. The wrong default is to build everything on demand because it feels flexible, but it often creates queueing, cost spikes, and unpredictable latency. The right question is which workloads are common enough to justify prebuilding and which are rare enough to leave to runtime demand.
This is closely related to the identity and lifecycle problems documented in Ultimate Guide to NHIs — The NHI Market, where scale and variance drive governance decisions more than theoretical completeness. It also aligns with NIST Cybersecurity Framework 2.0, which treats repeatable risk reduction as an operational discipline, not a one-time architecture choice. In practice, many teams discover their “build everything” policy only after the first wave of missed SLAs and developer backlash, rather than through intentional capacity planning.
How It Works in Practice
Platform teams should start by measuring real demand signals: request frequency, failure cost, build time, artifact size, and whether the workload requires special kernels, drivers, or images. The common case should be prebuilt into stable, versioned artifacts so teams can reuse what is already proven. On-demand execution should remain available for edge cases, experimental variants, and uncommon dependencies that do not justify a permanent build path.
A useful decision pattern is to separate the fleet into three buckets:
-
High-frequency standard cases: prebuild and cache aggressively.
-
Medium-frequency variants: prebuild selectively when the runtime cost is recurring.
-
Long-tail or bursty cases: build on demand, but cap blast radius with queueing and clear timeouts.
That approach keeps CI capacity focused on work that users actually consume. It also improves reliability because prebuilt artifacts can be scanned, signed, and promoted through a controlled pipeline before they are needed in production. By contrast, on-demand builds tend to be more variable, so they work best when the environment can tolerate latency and when the build surface is tightly constrained. The operational lesson is similar to the visibility gap described in JetBrains GitHub plugin token exposure: teams often assume the rare path is harmless until an uncommon request becomes the path attackers or overloaded systems exploit.
Current guidance suggests making the decision from telemetry, not preference. If a workload is requested often enough that build delay or compute waste becomes material, prebuild it. If it appears only occasionally and the artifact would sit idle most of the time, build it on demand. These controls tend to break down when build pipelines are shared across many teams with no clear ownership, because neither demand data nor artifact promotion rules are stable enough to support a clean split.
Common Variations and Edge Cases
Tighter prebuilding often increases storage and maintenance overhead, so organisations have to balance faster delivery against the cost of keeping more artifacts warm. That tradeoff becomes visible in environments with many kernel versions, GPU variants, or customer-specific images, where trying to prebuild everything can create more operational drag than it removes.
One common exception is compliance-sensitive workloads. Even if demand is low, teams may still prebuild if they need deterministic scanning, approval, or attestations before release. Another edge case is ephemeral research or test clusters, where on-demand builds make sense because the environment is short-lived and the artifact lifecycle is naturally disposable. Best practice is evolving here, and there is no universal standard for when the threshold should shift from prebuild to runtime generation.
For platform governance, the practical rule is to revisit the split regularly. If a “rare” variant starts recurring, it should move into the prebuild path. If a prebuilt image stops being used, it should be retired rather than kept alive by habit. That is how teams avoid turning flexibility into permanent maintenance debt, a problem echoed in the broader NHI operating model described in Ultimate Guide to NHIs — The NHI Market.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | ID.RA-1 | Demand-based build decisions depend on risk-informed prioritization. |
| NIST CSF 2.0 | PR.IP-1 | Prebuild vs on-demand is an operational process design choice. |
| OWASP Non-Human Identity Top 10 | NHI-08 | Build path choices affect secret handling and artifact exposure. |
Minimize long-lived build credentials and constrain sensitive artifacts in the delivery pipeline.
Related resources from NHI Mgmt Group
- How should security teams decide whether JIT access is safe for non-human identities?
- How do IAM and platform teams decide whether an agent should use GraphQL at all?
- How should teams decide whether to build or buy identity governance?
- How should teams decide whether to build or buy authorization logic?