Agentic AI foundations are still fragile without runtime access control

By NHI Mgmt Group Editorial TeamPublished 2026-05-01Domain: Agentic AI & NHIsSource: Aembit

TL;DR: Agent building remains brittle because today’s LLM workflows still require heavy human oversight, produce inconsistent outputs, and expose real systems and APIs while operating with weak error handling and unclear access boundaries, according to Aembit. That makes agentic AI an IAM problem now, not a future one.

At a glance

What this is: This is an independent analysis of why simple agent workflows remain unreliable and why controlling what AI agents can reach, and under what conditions, is now an identity governance problem.

Why it matters: It matters because practitioners need to govern AI agents as access-bearing identities before low-risk experimentation turns into uncontrolled access across real systems, APIs, and data.

👉 Read Aembit's analysis of how simple agents expose IAM and governance gaps

Context

Agentic AI is already moving from demo territory into real workflows, but the governance model is still immature. The core problem is not whether an agent can produce an answer, but whether it can act safely when it touches production systems, tools, and data as an identity-bearing runtime actor.

This article shows the gap clearly: the painful part is not model quality alone, it is the combination of probabilistic output, multi-step execution, and access to real resources. That is why workload identity, access scope, and runtime approval boundaries now sit inside the AI operating model, not around it.

For teams already thinking in NHI terms, the right question is which agent behaviours should be treated like a service account with ephemeral intent, and which should never be allowed to self-direct access at all. That is the practical shift practitioners need to make before agent sprawl becomes a control problem.

Key questions

Q: How should security teams govern AI agents that can call tools and access real systems?

A: Treat each agent as a workload identity with explicit scope, short-lived access, and tightly controlled tool permissions. Governance should define what the agent can reach, when it can act, and what evidence is required before it touches production systems. If the access model is broad or shared, the agent becomes a standing privilege problem, not an innovation feature.

Q: Why do AI agents create new IAM risks even when the model output looks acceptable?

A: Because acceptable-looking output does not mean safe execution. An agent can still invoke tools, reach APIs, or modify data while the underlying reasoning is probabilistic and error-prone. The risk is not just incorrect text, but uncontrolled runtime action tied to a real identity with real permissions.

Q: What breaks when agents are allowed to keep retrying until they succeed?

A: Unlimited retry loops turn small errors into repeated access attempts, repeated tool calls, and repeated exposure to the same failure state. That creates noisy behaviour, harder incident review, and broader blast radius when the agent keeps acting after it should have stopped. Retry policy is therefore part of access governance, not only engineering hygiene.

Q: How do teams know whether an agent is safe enough for production use?

A: Look for evidence that the agent can be contained by identity scope, observable tool use, and clear stop conditions. If the team cannot prove what the agent accessed, why it accessed it, and when it should have stopped, the system is not yet ready for production handling of sensitive workflows.

Technical breakdown

Why agent workflows fail when output is probabilistic

The article’s central technical point is that an agent is not a deterministic workflow engine. Each step, from reasoning to tool selection to output formatting, can drift, and the error rate compounds across multi-step tasks. That is why a task that looks 80% complete can still collapse in the final 20%, especially when the model is asked to revise, critique, and execute in sequence. In identity terms, the system is not just producing text. It is making runtime decisions that affect whether tools are called, data is retrieved, and actions are repeated or suppressed. That makes reliability a governance issue, not just a model-quality issue.

Practical implication: treat multi-step agent workflows as probabilistic access flows and require explicit failure boundaries before they can reach production systems.

Agent framework orchestration does not equal safe autonomy

CrewAI, LangGraph, AutoGen, and similar frameworks can coordinate tasks, but orchestration is not the same as trustworthy control. The article shows how sequential task design, prompt bloat, and weak tool-use signalling can confuse the model and the framework at the same time. That creates a brittle control layer where the platform may think it has received an action, while the LLM emits blended content that looks like both reasoning and execution. In governance terms, the wrapper does not create authority. It only moves the execution problem into a new layer where access, timing, and tool invocation still need to be constrained.

Practical implication: validate the agent’s tool-use contract, execution boundaries, and observable state transitions before allowing it to touch anything privileged.

MCP and tool plumbing still depend on workload identity

The article mentions MCP as part of the agent plumbing, but the more important issue is that every tool call still depends on an underlying identity with permissions. Whether a search tool, API, or local runtime is used, the agent needs credentials, scope, and runtime access rules. That means the security question is not only what the model says, but what the connected identity can actually do at the moment of execution. When access is broad, persistent, or poorly segmented, the agent becomes a new consumer of standing privilege. In NHI terms, this is a workload identity governance problem disguised as an AI product problem.

Practical implication: bind every agent to a narrowly scoped workload identity and review the downstream permissions of each connected tool.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Agentic AI introduces an access-governance problem before it becomes a model-risk problem. The article shows that even simple agents can touch real systems, real APIs, and real data while operating with inconsistent output quality. That means the first control failure is not hallucination, it is uncontrolled runtime access. For practitioners, the implication is that agent governance must start with identity scope, not with prompt quality.

Standing access is the wrong mental model for agents that decide and act in-session. Access review processes were designed for identities whose privileges persist long enough to be observed and certified. That assumption becomes fragile when an agent can request tools, run tasks, and terminate activity inside a single operational burst. The practitioner conclusion is to rethink how access is represented, not just how it is approved.

Agent framework complexity creates a governance illusion if the credential layer stays weak. Orchestration, critics, and fixers can improve workflow shape, but they do not solve the fact that the agent still needs permissions to act. When tool access is broader than task scope, the governance gap sits in the identity layer, not the framework layer. The implication is that teams should treat agent frameworks as consumers of NHI controls, not substitutes for them.

Runtime access boundaries are now part of the AI operating model. The article’s most useful lesson is that low-risk pilots can still interact with production systems in ways that matter. That makes workload IAM, conditional access, and tool-level scoping foundational controls for agentic AI. Practitioners should assume the control plane is incomplete until identity and execution are governed together.

From our research:
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
In the same research, organisations maintain an average of 6 distinct secrets manager instances, which fragments control and weakens central oversight.
For the wider control-plane view, the Ultimate Guide to NHIs is a useful companion resource for mapping how NHI governance evolves as automation becomes more agentic.

What this signals

Runtime access governance is becoming the real test of agent maturity. As organisations move from prompts to multi-step execution, the issue is no longer whether the model can produce a useful output, but whether the identity behind it can be bounded to the task. That is why workload IAM, short-lived permissions, and tool-level scoping are now the decisive controls for agentic AI.

Ephemeral credential trust debt: agent pilots accumulate hidden risk when teams prove concepts with broad access first and add governance later. The longer an agent operates with overbroad permissions, the harder it becomes to recover the original intent of the access grant. Practitioners should assume every unchecked proof of concept becomes a future production entitlement.

For teams aligning agent controls to external standards, the NIST AI Risk Management Framework remains useful for governance structure, while the OWASP Agentic AI Top 10 helps map tool misuse and agentic execution risk. The practical signal is clear: model quality alone will not protect weak access design.

For practitioners

Scope agent identities to individual tasks Bind each agent to a narrow workload identity, then limit that identity to the minimum tools, APIs, and data paths required for the specific run. Avoid shared credentials across agents or reuse of broad service accounts that outlive the task.
Separate reasoning from execution authority Keep critique, planning, and tool invocation under different permissions where possible, so a model that drafts an action cannot also execute it by default. This reduces the chance that an instruction error becomes a system action.
Instrument tool-use visibility end to end Log the final prompt, tool call, tool response, and any framework transformation so investigators can reconstruct what the agent actually did. Partial logging leaves the identity and control story incomplete.
Set explicit stop conditions for autonomous loops Define when an agent must stop, escalate, or hand control back to a human if it repeats failures, exceeds a bounded number of retries, or starts producing mixed action and analysis output. Do not let the loop continue by default.

Key takeaways

Agentic AI is already an identity governance issue because agents act through real credentials, real tools, and real runtime decisions.
Probabilistic output becomes a security problem when execution authority is broad, persistent, or poorly observed.
Teams should scope agent access like any other workload identity and prove containment before they scale autonomy.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agentic tool use and runtime autonomy create direct exposure to tool misuse.
OWASP Non-Human Identity Top 10	NHI-01	Agents need scoped, governed non-human identities to access systems safely.
NIST AI RMF		AI governance is needed where agents make runtime decisions affecting real systems.

Apply AI RMF GOVERN and MAP functions to define accountability and operational boundaries.

Key terms

Agentic AI: Software that can make runtime decisions about what actions to take, which tools to use, and when to execute them. In identity terms, it behaves like a non-human actor that needs explicit governance around scope, approval, and observability because execution is not purely scripted.
Workload Identity: A machine or software identity used by applications, services, or agents to authenticate and obtain access. For agentic systems, the important issue is not whether the identity exists, but whether its permissions are narrowly scoped, traceable, and safe for autonomous or semi-autonomous use.
Standing Privilege: Access that remains available without needing fresh approval or task-specific provisioning. In agentic environments, standing privilege increases risk because a model can reach tools or data at any time unless the identity layer imposes strict runtime boundaries and short-lived authorization.
Tool Invocation: The act of an AI system calling an external function, API, search service, or internal system during execution. This is where model behaviour becomes operational risk, because the output is no longer just text and can directly affect systems if access is not constrained.

Deepen your knowledge

Agentic AI governance and workload identity boundaries are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for agents that can call tools and access production systems, it is worth exploring.

This post draws on content published by Aembit: the limits of simple agent workflows and the need for runtime access control. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-01.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org