Why do traditional shift-left controls fall short for AI security?

Shift-left controls work best for deterministic software, where flaws can be found before release. AI systems continue to change at runtime through new context, tool calls, and decision paths. That means security has to follow the AI into production and observe how it behaves after deployment, not just how it was built.

Why This Matters for Security Teams

Traditional shift-left programs assume the security problem is mostly visible in code, configuration, and pre-release testing. AI systems break that assumption because they continue to change after deployment through prompt context, tool execution, model updates, and chained decisions. That means risk is not limited to what was built, but also to what the system becomes in production.

This is especially important when AI is connected to secrets, APIs, customer data, or internal systems. A model can appear safe in a lab and still become dangerous once it starts calling tools, retrieving fresh context, or inheriting permissions from a workflow. NHI Management Group has documented how exposed or poorly governed identities are repeatedly used as the first step in AI abuse, including in the LLMjacking research and the State of Non-Human Identity Security. For identity assurance, the baseline still maps to the NIST SP 800-63 Digital Identity Guidelines, but AI systems need runtime controls beyond pre-deployment review.

In practice, many security teams encounter AI abuse only after the agent has already accessed a sensitive tool chain, rather than through intentional pre-release testing.

How It Works in Practice

Shift-left controls still matter, but they are only one layer. For AI security, the control objective has to move from “did the build pass review?” to “what is the system allowed to do right now, in this context?” That is the core distinction behind runtime governance for agentic workloads. The CSA MAESTRO agentic AI threat modeling framework and NHI research such as the DeepSeek breach both point to the same operational issue: static assumptions fail when the system can decide, call, retrieve, and escalate dynamically.

Use workload identity for the AI service or agent, so the system proves what it is before it gets access.
Issue JIT credentials per task, with short TTLs and automatic revocation when the job ends.
Apply policy at request time, not only at design time, so tool use, data access, and external calls are evaluated in context.
Separate model testing from production authorization, because a safe model can still become unsafe once connected to live systems.
Continuously log prompt, tool, and identity events so anomalous chains can be detected after deployment.

Best practice is evolving toward runtime authorization models that combine workload identity, ephemeral secrets, and policy-as-code rather than broad standing access. That approach aligns with current guidance from emerging agentic AI frameworks and with the reality that AI behavior is not fully deterministic. These controls tend to break down when the agent is embedded in a legacy workflow with shared service accounts and broad inherited permissions because the runtime cannot reliably separate one task from the next.

Common Variations and Edge Cases

Tighter pre-release review often increases operational overhead, requiring organisations to balance faster delivery against the need for continuous runtime control. That tradeoff is real, especially where AI is experimental, multi-tenant, or heavily integrated into business workflows. There is no universal standard for this yet, so security teams should treat some practices as current guidance rather than settled doctrine.

One common edge case is the “safe model, unsafe wrapper” problem. The model itself may be well tested, but the surrounding agent can still expose data, reuse credentials, or chain tools in ways the original assessment never covered. Another is vendor-hosted AI, where teams may have limited visibility into logs, tool permissions, or identity boundaries. In those environments, shift-left reviews often stop at the contract or deployment gate, while the actual risk emerges in production telemetry and identity misuse patterns.

The practical answer is to combine design-time review with continuous post-deployment monitoring, especially for systems that can retrieve data, invoke tools, or act on behalf of users. NHI Management Group’s Ultimate Guide to NHIs | Standards is a useful anchor for the identity side of that model. In short, shift-left reduces exposure, but it cannot replace runtime governance when the workload itself is adaptive.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A02	Addresses unsafe agent behavior that emerges after release.
CSA MAESTRO	GOV-1	Covers governance for agentic systems with changing runtime behavior.
NIST AI RMF		Supports managing AI risks that only appear in production.

Pair pre-release testing with live monitoring, logging, and incident response for deployed AI systems.

Why do traditional shift-left controls fall short for AI security?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group