Yes, if the organisation cannot prove inventory, ownership, and least privilege for the agent environment. Production agents are not just another feature rollout, because each agent can widen the blast radius of existing credential and entitlement mistakes. Mature governance is the gating requirement, not an optional hardening step.
Why This Matters for Security Teams
Production AI agents change the risk model because they do not wait for a human to click “approve” before acting. They can chain tools, call APIs, and pursue goals across systems, so static RBAC alone rarely reflects what they will do at runtime. Current guidance suggests that identity governance has to be mature enough to prove inventory, ownership, and least privilege before agents are allowed meaningful access. The 2026 Infrastructure Identity Survey found that only 44% of organisations have any policies to manage AI agents, even though 92% agree governance is critical to enterprise security.
That gap matters because agent behaviour is autonomous and often opaque. A model may be well-tuned, yet still trigger unintended escalation when it receives broad credentials, persistent secrets, or poorly scoped tool access. The right question is not whether AI is useful, but whether the organisation can constrain a goal-driven workload with the same discipline expected for privileged infrastructure. Frameworks such as the OWASP Top 10 for Agentic Applications 2026 and the NIST AI Risk Management Framework both point toward stronger governance, context-aware controls, and continuous oversight rather than blind trust in model behaviour. In practice, many security teams encounter agent overreach only after the first unintended API call or lateral movement has already occurred, rather than through intentional rollout testing.
How It Works in Practice
Delaying production does not mean blocking experimentation. It means treating agents as privileged workloads that need a dedicated identity, tight scoping, and runtime decisioning. The best practice is evolving toward workload identity, JIT credential provisioning, and intent-based authorisation so an agent receives only the access needed for a specific task, for a short time, and with an auditable purpose. That aligns with the controls discussed in NHIMG’s Ultimate Guide to NHIs and with the operational patterns described by the CSA MAESTRO agentic AI threat modeling framework.
A workable production pattern usually includes:
- Unique workload identity for each agent, often backed by short-lived cryptographic tokens rather than shared service accounts.
- Policy-as-code checks at request time, so access is granted based on task context, target system, and risk level, not just broad role membership.
- Ephemeral secrets with strict TTLs, automatic revocation, and no long-lived static credentials in code or config.
- Separated permissions for read, write, and actuation, especially when agents can trigger infrastructure or security changes.
- Continuous logging that ties every action back to the agent identity, the tool invoked, and the policy decision that allowed it.
NHIMG research shows why this matters: 97% of NHIs carry excessive privileges, and only 5.7% of organisations have full visibility into their service accounts. That is a dangerous starting point for agentic AI, where the workload can adapt faster than a manual approval process can keep up. Pairing this with external guidance from the NIST AI Risk Management Framework helps organisations formalise governance, measurement, and monitoring before production expansion. These controls tend to break down when agents are given shared credentials in CI/CD-heavy environments because the identity no longer maps cleanly to a single actor, task, or blast radius.
Common Variations and Edge Cases
Tighter agent governance often increases delivery overhead, requiring organisations to balance automation speed against operational control. That tradeoff is real, especially where teams want agents to assist developers, respond to incidents, or orchestrate infrastructure at scale. There is no universal standard for this yet, but current guidance suggests that higher-trust use cases should wait until the organisation can prove least privilege, rotation, and revocation discipline for non-human identities. The OWASP NHI Top 10 is useful here because agent risk often emerges from over-permissioned identities, not just model mistakes.
Some edge cases justify earlier production use, such as read-only agents, narrow retrieval assistants, or tightly sandboxed copilots that cannot execute changes. Even then, a cautious rollout should limit secrets exposure, isolate tool access, and require explicit ownership for each agent. For environments under strong operational pressure, a phased approach works better than a blanket launch decision: start with non-destructive tasks, move to JIT access for bounded actions, and only then expand to autonomous execution. For lifecycle and audit expectations, the Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs and the Ultimate Guide to NHIs — Regulatory and Audit Perspectives help teams translate governance into reviewable operations.
Where organisations skip that sequencing, agent deployments usually fail at the identity layer first, not the model layer, because the blast radius is defined by credentials and entitlements rather than by prompt quality alone.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | NHI-03 | Agentic risk rises when credentials are over-scoped and persistent. |
| CSA MAESTRO | MAESTRO centers threat modeling for agent identity and tool use. | |
| NIST AI RMF | AI RMF supports governance, measurement, and monitoring for autonomous agents. |
Assign owners, define risk tolerances, and continuously monitor agent behavior before production rollout.