Subscribe to the Non-Human & AI Identity Journal

What breaks when AI agents discover tools at runtime instead of using hardcoded lists?

Hardcoded lists assume the tool set is stable enough to be embedded in code. That breaks when services change frequently, environments differ, or new tools appear without redeploying the agent. Runtime discovery shifts the trust boundary to the registry, so misclassified or overexposed tools can be selected immediately.

Why This Matters for Security Teams

Runtime tool discovery changes the control point from source code to the registry, catalog, or broker the agent consults at execution time. That sounds flexible, but it also means security can no longer assume the agent only sees a pre-approved tool list. When the tool set is dynamic, every discovered endpoint becomes part of the agent’s effective authority, especially if metadata, descriptions, or labels are incomplete. This is why guidance around agentic systems increasingly emphasizes runtime policy checks in sources like the OWASP Agentic AI Top 10 and NHI lifecycle discipline in NHIMG’s NHI Lifecycle Management Guide.

The practical issue is not discovery itself, but trust. If the agent can browse new tools at runtime, it can also select tools that were never reviewed for scope, data exposure, or privilege boundaries. A mislabelled tool can become an immediate escalation path, and a stale registry entry can keep exposing capabilities long after they should be retired. In practice, many security teams encounter this only after an agent has already connected to an overexposed tool or begun chaining actions through a registry that was assumed to be safe.

How It Works in Practice

Runtime discovery is usually implemented through a catalog, broker, or protocol layer that advertises tools, schemas, and permissions to the agent on demand. The challenge is that discovery metadata becomes security-relevant, not just descriptive. If an agent is allowed to act on every tool it can see, the registry is effectively the policy boundary. That is why current guidance suggests coupling discovery with request-time authorization, not static allowlists alone. The NIST AI Risk Management Framework and NHIMG’s OWASP NHI Top 10 both support the broader principle: decisions must reflect current context, not just developer intent.

Practitioners usually need four controls working together:

  • Signed or strongly authenticated tool registries so discovery results can be trusted.
  • Runtime policy evaluation so the agent can only invoke tools aligned to the current task, tenant, and data sensitivity.
  • Just-in-time credentials so tool access expires when the task ends, instead of persisting across sessions.
  • Tool classification and human review for high-risk capabilities, especially anything that can move data, modify infrastructure, or call other agents.

This is where workload identity matters. The agent should prove what it is through cryptographic identity, while authorization decides what it may do right now. That separation is what reduces the blast radius when discovery expands the tool surface. These controls tend to break down when the registry is shared across tenants and tool metadata is not tightly governed, because the agent can discover capabilities that were never intended for its environment.

Common Variations and Edge Cases

Tighter runtime control often increases operational overhead, requiring organisations to balance agility against review burden. That tradeoff is especially sharp in fast-changing environments where tools are added daily, because fully static approvals create friction while fully open discovery creates risk. Best practice is evolving, but there is no universal standard for this yet; the right balance depends on how quickly the tool ecosystem changes and how sensitive the agent’s actions are.

One common edge case is environments that use tool aliases, dynamic endpoints, or per-tenant plugins. In those settings, a single “tool name” may resolve differently across contexts, which can confuse both human reviewers and policy engines. Another edge case is fallback behavior: if the preferred tool is unavailable, some agents will discover and select an alternate tool with broader authority. That makes deny-by-default discovery important, especially for agents that can chain tools or operate with partial autonomy. NHIMG’s Top 10 NHI Issues and the CSA MAESTRO agentic AI threat modeling framework are useful references for thinking about discovery as a governance problem, not just an integration feature.

The sharpest failure mode appears when discovery is coupled to broad secrets or delegated API access. In that case, a new tool is not just a new action path; it becomes a new credential exposure path as well. The practical answer is to keep discovery narrow, bind it to task context, and revoke access as soon as the workflow completes.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A01 Runtime tool discovery expands the attack surface for agent misuse and tool chaining.
CSA MAESTRO MAESTRO models agentic trust boundaries, including tool selection and escalation paths.
NIST AI RMF AI RMF covers governance for context-aware decisions and changing system behavior.

Treat discovered tools as dynamic attack surface and gate each invocation with runtime policy.