TL;DR: AI pilots often look compelling in demos but collapse on economics and governance, with one example showing $750 in value against a $500,000 build cost and production ROI only appearing at scale, according to Strata Identity. The real blocker is identity and security, because over-permissioned agents, shared credentials, and weak auditability keep pilots from becoming governable systems.
At a glance
What this is: This is an analysis of why AI pilots stall before production, with identity and security controls emerging as the main gating factors.
Why it matters: It matters because IAM, NHI, and autonomous-system programmes all need production-grade identity controls before AI agents can be trusted at scale.
By the numbers:
- A pilot that resolves 10 support tickets delivers roughly $750 in value against a $500,000 build cost.
👉 Read Strata Identity's analysis of why AI pilots stall before production
Context
AI pilots fail when identity and security controls are treated as afterthoughts. The core problem is not whether a demo works, but whether the system can prove who acted, what it touched, and why that action was authorised once it reaches production scale.
For IAM teams, this is an NHI governance problem first and an AI problem second. Shared credentials, broad delegation, and weak auditability turn otherwise workable pilots into systems that security and compliance teams cannot sign off for production.
Key questions
Q: How should security teams move AI pilots into production without increasing identity risk?
A: They should require scoped delegation, replayable audit evidence, and production-like sandbox testing before any business-critical rollout. The goal is to prove that the agent can act with least privilege, that every action is attributable, and that compliance teams can reconstruct the transaction chain without guesswork.
Q: Why do AI pilots create so many identity and access control problems?
A: Pilots often rely on shared credentials, broad access, and incomplete logging so the demo succeeds quickly. Those shortcuts are manageable in a sandbox, but they become blockers in production because security teams cannot prove who acted, what was accessed, or whether the access stayed within policy.
Q: How do you know if AI agent access controls are actually working?
A: Look for evidence that privileges shrink at each delegation step, tokens are bound to the requester, and every transaction can be replayed end to end. If you still need to explain actions with informal notes or post hoc reconstruction, the control model is not yet production-ready.
Q: What should organisations verify before approving AI agents for regulated workloads?
A: They should verify that the system produces defensible audit trails, enforces least privilege under load, and survives sandbox validation at the same scale as the target workflow. Regulated environments need evidence, not intent, because approval depends on repeatable control behaviour.
Technical breakdown
Token exchange and scoped delegation for AI agents
The article describes delegation patterns built around RFC 8693 token exchange, where a downstream token is narrower than the upstream credential that created it. That matters because AI pilots often start with broad access so the demo works, then inherit excessive privilege as they scale. DPoP binding adds a second control by tying a token to the requester, making reuse harder even if the token is stolen. The technical point is not just access control. It is identity continuity across chained actions, where each step must carry a provable, reduced trust boundary.
Practical implication: replace shared credentials with scoped delegation paths that shrink privilege at every hop.
Audit trails, replayability, and cryptographic proof
Production approval depends on more than logs. A usable audit trail must show who initiated the action, which agent or sub-agent executed it, what resource was accessed, why the action was allowed, and when it occurred, ideally with replay capability. That is the difference between operational visibility and evidentiary control. In AI environments, ordinary logs often fail because they do not preserve the full delegation chain or the policy decision that justified the action. Compliance teams need artefacts they can verify, not narratives they must reconstruct after the fact.
Practical implication: instrument transactions so each AI action can be replayed and attributed end to end.
Sandbox validation before production exposure
A sandbox is not a toy environment. It is where identity boundaries, permission structures, and observability pipelines are stress-tested at the scale the pilot will eventually need. The article argues that many teams validate functionality at 10 agents, then discover control failures at 1,000. That gap is a governance issue, because controls that work in a constrained pilot may collapse under real concurrency, broader scopes, and multi-step workflows. Production readiness therefore depends on validating the control model, not just the model behaviour.
Practical implication: validate identity controls at production-like scale before any real business process is exposed.
Breaches seen in the wild
- Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
- DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Production readiness for AI agents is an identity problem disguised as a model problem. The article shows that cost and capability only matter once the control plane can prove who is acting, under what scope, and with what audit evidence. That is why security teams reject many pilots even when the demo looks successful. Practitioners should treat identity orchestration as the production gate, not a finishing layer.
Scoped delegation is the difference between an AI pilot and an auditable system. Broad credentials let pilots survive, but they also create the over-permissioned state that breaks compliance, incident response, and accountability. Token exchange and proof-of-possession reduce the blast radius of each delegated action and make privilege boundaries explicit. Teams that cannot express scope precisely do not have a production control model yet.
Auditability is no longer a reporting feature, it is an authorisation requirement. The article’s emphasis on replayable transactions reflects a broader shift in enterprise AI governance. If a system cannot reconstruct the delegation chain from human to agent to sub-agent, it cannot satisfy operational assurance or regulatory scrutiny. Practitioners need evidentiary identity controls, not just monitoring output.
Identity blast radius is the named concept that explains why pilots stall. The problem is not simply that agents have access. It is that the access pattern expands from a controlled demo footprint into a production-scale blast radius before the programme has proven containment. The implication is that identity design must be evaluated by failure domain, not by whether the pilot functions in isolation.
Access review processes were designed for stable privilege states, and that assumption strains under agentic workflows. Even when AI is not fully autonomous, runtime delegation can change quickly enough that periodic review no longer captures meaningful exposure. The implication is that identity governance has to account for short-lived, chained, and machine-paced privilege changes rather than only static entitlements.
From our research:
- 68% of organisations do not know how to fully address NHI risks, according to Ultimate Guide to NHIs.
- Only 5.7% of organisations have full visibility into their service accounts, which explains why identity sprawl becomes a production blocker before it becomes a governance talking point.
- For the broader lifecycle view, the Ultimate Guide to NHIs shows how visibility, rotation, and offboarding all connect to production readiness.
What this signals
Identity blast radius: The production question is no longer whether an agent can perform a task, but how far its delegated authority can spread before governance catches up. When the identity model is weak, every successful pilot increases the size of the eventual containment problem, especially in workflows that chain actions across systems.
With 91.6% of secrets still valid five days after notification, according to our Ultimate Guide to NHIs, production AI governance cannot depend on slow remediation cycles. The practical shift is toward evidence-driven controls that can be enforced at runtime, not reviewed after exposure.
Teams that are building AI operating models should align identity orchestration with NIST AI Risk Management Framework governance expectations and the agentic control patterns in OWASP Agentic AI Top 10. The signal is clear: production approval now depends on proving constraint, attribution, and replayability together.
For practitioners
- Inventory actual agent permissions before scale-up Map every credential, token exchange path, and delegated entitlement used by the pilot, then compare it to the minimum scope required for production. Focus on hidden over-scoping that appears only when the system is asked to do real work.
- Replace shared credentials with bounded delegation Use scoped token exchange and proof-of-possession patterns so each agent receives only the access needed for its current task. Eliminate reusable credentials that make attribution and containment impossible once the pilot expands.
- Require replayable audit evidence before production sign-off Do not accept generic logs as proof of control. Insist on end-to-end transaction capture that shows the initiating identity, the delegated actor, the policy decision, and the affected resource in a form auditors can verify.
- Run sandbox tests at production-like scale Validate identity controls, observability, and policy enforcement with realistic concurrency and workflow depth before exposing business processes. A pilot that works for 10 agents is not validated for 1,000 agents.
Key takeaways
- AI pilots fail to reach production when identity, delegation, and auditability are not designed as core controls from the outset.
- The scale problem is measurable, with small pilots producing little value while production deployments can flip the economics entirely.
- Teams that cannot prove least privilege and transaction replay should expect security and compliance to block release into real workloads.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agent delegation, scope drift, and tool misuse are central to the article's production-risk framing. | |
| OWASP Non-Human Identity Top 10 | NHI-03 | The article focuses on over-permissioned agents and weak credential governance. |
| NIST CSF 2.0 | PR.AA-01 | The post stresses identity proofing, attribution, and auditability as production gates. |
Map agent identity controls to access governance and require evidence before production approval.
Key terms
- Identity Orchestration: Identity orchestration is the control layer that coordinates credentials, delegation, policy checks, and audit evidence across systems. In AI environments, it determines whether an agent can act with bounded authority and whether each action can be attributed, replayed, and reviewed.
- Scoped Delegation: Scoped delegation is the practice of passing only the minimum authority needed for a specific task. For AI agents, that scope must shrink at each handoff and remain auditable, or the delegation chain becomes a hidden privilege expansion path.
- Replayable Audit Trail: A replayable audit trail is an evidentiary record that lets a team reconstruct an action end to end, not just see that something happened. It preserves the actor chain, policy decision, resource touched, and execution sequence in a form useful for compliance and incident review.
Deepen your knowledge
AI pilot identity controls and production-grade delegation are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are trying to move an agentic pilot out of the demo stage, it is worth exploring.
This post draws on content published by Strata Identity: The Most Expensive Mistake in Enterprise AI. Read the original.
Published by the NHIMG editorial team on 2025-10-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org