AI pilot failure is usually a governance problem, not a model problem

By NHI Mgmt Group Editorial TeamPublished 2026-06-21Domain: Agentic AI & NHIsSource: WitnessAI

TL;DR: Most enterprise AI pilots stall after proving technical feasibility because approval criteria, governance ownership, runtime evidence, and business-value measures were not set early enough, according to WitnessAI and cited BCG, McKinsey, Deloitte, and IBM research. The control gap is structural: production readiness cannot be bolted on after experimentation, especially once agents and shadow AI enter the picture.

At a glance

What this is: This is an independent analysis of why AI pilots get stuck between experimentation and production, and the key finding is that operating-model gaps usually block scale more than model performance does.

Why it matters: It matters because IAM, security, and governance teams now have to approve AI systems, agents, and shadow use cases with evidence models that were built for people and static applications, not runtime AI behaviour.

By the numbers:

Only 5% of organizations consistently generate substantial value from AI.
74% of companies struggle to achieve and scale value from AI.
60% of organizations had AI governance policies, meaning 40% lacked them to prevent shadow AI proliferation.

👉 Read WitnessAI's analysis of why AI pilots stall before production

Context

Most AI pilots fail because the enterprise treats production as a later decision instead of a design constraint. The result is familiar: a model meets accuracy targets, but security review, legal documentation, ownership, and approval criteria were not defined in time to support release into production.

For identity and access teams, the real issue is not just AI adoption. It is that AI systems, including agents, now sit inside governance processes that were designed for human users and conventional applications. That is why the production gate becomes the point where undefined accountability, missing evidence, and late controls create delay.

Key questions

Q: How should organisations move AI pilots into production without creating governance debt?

A: Start with production criteria, not just model performance. Define ownership, acceptable use, logging, escalation, and audit evidence before the pilot begins. That lets security, legal, and business teams approve a deployment against known requirements instead of discovering missing controls at the final gate.

Q: Why do AI pilots create shadow AI when review processes are too slow?

A: Users rarely stop working while governance catches up. If an approved path is slow or unclear, employees adopt consumer AI tools or unsanctioned models to meet deadlines. The result is reduced visibility, weaker policy enforcement, and a growing gap between official AI governance and real behaviour.

Q: What do security teams get wrong about AI agent governance?

A: They often treat agents like static applications or ordinary service accounts. In practice, an agent may choose tools, change actions at runtime, and move across APIs inside one session, so the control problem is runtime behaviour, not just initial access approval.

Q: Who should own AI production approval and evidence collection?

A: Accountability should sit with a cross-functional governance group that includes security, legal, privacy, operations, and the business sponsor. If any one team owns the decision alone, the organisation usually ends up with either weak controls or a pilot that never ships.

Technical breakdown

Why AI pilots stall at the production gate

A pilot can succeed technically and still fail operationally because enterprise approval depends on evidence, ownership, and control mapping. If security, legal, privacy, and business sponsors are only asked to review a near-finished pilot, the review turns into a risk hunt instead of a readiness check. The model may be accurate, but the organisation has not defined what acceptable use, monitoring, escalation, and audit evidence look like in production. In practice, stalled pilots are usually governance failures expressed as delivery delays.

Practical implication: define production criteria before the first pilot starts, or expect repeated sign-off failure later.

How shadow AI turns pilot friction into unmanaged usage

When sanctioned AI paths move slowly, employees often use consumer tools or unsanctioned models to keep working. That creates Shadow AI, which is undiscovered or unmanaged AI usage outside policy, logging, and approval. The security problem is not only data leakage. It is the loss of identity control over who is using which model, with what inputs, and under which contract or retention terms. Once usage goes outside the approved path, governance teams lose the evidence needed for audit, response, and policy enforcement.

Practical implication: build a sanctioned adoption path with logging and controls before users create their own.

Why agentic AI changes the identity question

An AI agent is not just another application. It is a software entity that can choose actions, tools, and timing at runtime, which means its identity behaviour can change as it works. That matters because traditional IAM assumes access is provisioned for known purposes and then reviewed after the fact. With agents, the decision path can unfold inside a single session, across multiple APIs, and with machine speed. The governance challenge is therefore not only authentication, but runtime authorisation, evidence capture, and blast-radius control.

Practical implication: evaluate agents as dynamic identity actors, not as static service accounts with a chatbot front end.

Threat narrative

Attacker objective: The objective is to gain useful access to business data and workflows through unmanaged AI usage that bypasses enterprise governance.

entry: Users adopt unsanctioned AI tools after the approved pilot fails to move quickly enough into production.
escalation: Those tools process company data without consistent policy, logging, or governance oversight, creating unmanaged access paths.
impact: The organisation loses visibility into where prompts, outputs, and derived data went, increasing compliance and exposure risk.

DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
Schneider Electric credentials breach — exposed credentials gave attackers access to Schneider Electric Jira, exfiltrating 40GB.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI pilot failure is usually an operating-model failure disguised as a technology problem. The model often performs as expected, but production approval depends on governance inputs that were never defined early enough. That means security, legal, and business ownership become blockers rather than enablers. Practitioners should treat pilot design and production readiness as the same control surface, not separate stages.

Shadow AI appears when sanctioned adoption paths are slower than user demand. When employees cannot get approved tools fast enough, they route work to consumer services and unmanaged models. That creates an identity problem as much as a data problem because the enterprise loses control over who is using what AI, under which policy, and with which evidence trail. The implication is that governance gaps directly shape user behaviour.

AI agents make runtime identity evidence more important than static access approval. Agents can select tools, initiate actions, and interact with APIs inside a session, which means approval at provisioning time is no longer enough to explain actual behaviour. This is where conventional access review assumptions start to weaken, especially when the organisation cannot observe the full action path. Practitioners need governance models that can follow runtime identity behaviour, not just record entitlement state.

Production readiness now has a compliance deadline attached to it. The move from pilot to deployment is no longer just an internal delivery concern because regulatory expectations around AI governance are becoming more explicit. That pressure raises the cost of vague ownership, late documentation, and weak evidence collection. The practical conclusion is that programmes without early governance are not merely slower, they are increasingly harder to defend.

Runtime visibility is becoming the new trust boundary for enterprise AI. Once AI use spans employees, applications, and agents, the key question is no longer whether the model can work, but whether the organisation can observe and govern what actually happened. That is the named concept here: runtime governance gap. It describes the space between technical feasibility and defensible production control, and it is where most pilot programmes lose momentum.

From our research:
Only 44% of developers are reported to follow security best practices for secrets management, according to The State of Secrets in AppSec.
Another finding from the same research shows that the average estimated time to remediate a leaked secret is 27 days, even though 75% of organisations express strong confidence in their secrets management capabilities.
For the broader identity control picture, see Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs for the operational gap between approval and ongoing governance.

What this signals

Runtime governance gap: AI programmes now need controls that can observe behaviour at the moment it happens, not just during approval. That means AI observability, policy evidence, and audit-ready logging are becoming part of the identity control plane, especially where agents or Shadow AI are involved.

The practical signal for IAM and security teams is that AI adoption will keep fragmenting until sanctioned paths are faster than unsanctioned ones. Organisations that cannot show what was used, by whom, and under what rules will struggle to justify production approval or compliance posture.

As AI moves from pilot to service, the control model starts to look less like a project checklist and more like a continuous identity programme. The teams that prepare for that shift early will have an easier time aligning with the NIST Cybersecurity Framework 2.0 and AI governance expectations.

For practitioners

Define production criteria before the pilot starts Document approval thresholds, ownership, monitoring, and audit evidence in the pilot charter so review teams are not inventing controls at the sign-off gate.
Create a sanctioned path for AI adoption Give employees an approved tool route with logging, policy enforcement, and clear data handling rules so Shadow AI does not become the default workaround.
Separate pilot success from production readiness Track model accuracy, business value, and governance evidence as different milestones so a good demo does not masquerade as deployable control maturity.
Treat agents as runtime identity actors Assess which AI systems can choose tools or initiate actions at runtime, then require controls that capture session behaviour rather than only initial entitlement grants.

Key takeaways

Most AI pilot failures are governance failures, not model failures, because production readiness was never designed into the pilot.
Shadow AI emerges when employees move faster than the approval process, which turns governance delay into unmanaged identity risk.
AI agents force enterprises to govern runtime behaviour, not just initial access, so the approval model has to evolve before scale does.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST AI RMF		AI pilot governance, measurement, and oversight map directly to the AI RMF lifecycle.
NIST CSF 2.0	GV.RM-01	Risk management governance is central when AI pilots stall at production review.
OWASP Agentic AI Top 10		Agent runtime behaviour and tool use create attack and governance patterns covered by agentic AI guidance.

Treat agents as dynamic identity actors and add runtime controls before allowing broad access.

Key terms

Shadow AI: Shadow AI is AI use that happens outside approved enterprise governance, logging, or policy. It often starts when sanctioned tools are too slow or too restrictive. The security issue is not only unknown software use, but also the loss of identity, data, and audit control over the interaction.
Runtime governance: Runtime governance is the set of controls that observe and influence AI behaviour while the system is actually operating. It matters because approval at design time does not explain what an AI system or agent did during a live session. In practice, it requires logging, policy enforcement, and evidence capture.
AI agent: An AI agent is a software entity that can choose actions, select tools, and decide when to execute them during runtime. That makes it different from a scripted automation or a simple chatbot. For governance teams, the key issue is that the agent's behaviour can change after deployment.
Production readiness: Production readiness is the condition that a system has the evidence, ownership, controls, and operating processes needed for live use. In AI programmes, it goes beyond model quality to include governance, auditability, legal review, and monitoring. Without it, a successful pilot can still fail at release.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by WitnessAI: why AI pilots fail before production. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-21.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org