How should security teams govern model routing in AI agent workflows?

Why This Matters for Security Teams

Model routing decides which model sees which request, so it is effectively an authorisation layer for autonomous behaviour, not just an optimisation knob. If an agent can shift from a safe front-line model to a more capable backend model without policy checks, it may inherit broader tool access, longer context windows, and weaker oversight. That is where routing becomes a control-plane risk, especially when agents handle sensitive data or initiate actions on behalf of users.

Current guidance suggests treating routing rules as part of the agent’s trust boundary, alongside identity, tool permissions, and logging. That aligns with OWASP Agentic AI Top 10 and NIST AI Risk Management Framework, both of which emphasise governance, accountability, and risk-based controls. NHIMG research shows why this matters: in the AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already acted beyond intended scope.

In practice, many security teams encounter routing abuse only after an agent has already escalated into a higher-trust model and completed the wrong action.

How It Works in Practice

Effective model routing uses policy-as-code at request time. The routing engine should evaluate the agent’s intent, the data classification of the prompt, the tools being requested, the workload identity making the call, and whether the action can be completed with a lower-risk model. That is closer to intent-based authorisation than to static RBAC, because autonomous agents do not follow fixed human patterns. For that reason, security teams should define “allowed”, “escalate”, and “block” outcomes explicitly, then log the decision and the context that produced it.

In mature implementations, the agent presents a cryptographic workload identity, then receives just-in-time credentials only for the task at hand. Short-lived secrets reduce blast radius if a routing path is abused, and ephemeral permissions prevent a low-risk request from becoming a standing privilege. This is the same governance logic discussed in NHIMG’s OWASP NHI Top 10 and in the Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs, where identity lifecycle and revocation are central controls.

Route low-risk, read-only prompts to the smallest capable model.

Escalate only when the request needs tool use, deeper reasoning, or privileged data access.

Block prompts that cross policy boundaries, such as secret retrieval, exfiltration, or unaudited action.

Require audit logs that capture the model choice, policy decision, user context, and downstream tool calls.

Architecturally, this aligns with the CSA MAESTRO agentic AI threat modeling framework and the NIST AI Risk Management Framework, both of which favour explicit risk decisions over implicit trust. These controls tend to break down when routing is handled inside application code without a central policy engine, because teams lose consistent enforcement across environments and cannot prove why one model was chosen over another.

Common Variations and Edge Cases

Tighter routing often increases latency and operational overhead, so organisations have to balance safety against throughput and cost. That tradeoff is real, especially in high-volume copilots where every extra policy evaluation or escalation adds friction. Best practice is evolving, but there is no universal standard yet for how much context should be inspected before routing a request.

Some environments need more conservative routing than others. Financial services, healthcare, and code-execution workflows usually require stricter escalation thresholds because the agent may touch regulated data or trigger irreversible actions. By contrast, low-risk summarisation can often stay on the front-line model, provided the system still blocks hidden tool access and logs the decision. For route governance to hold up, teams should also watch for prompt injection, cross-agent contamination, and secrets leakage, which are recurring themes in NHIMG’s AI LLM hijack breach analysis and the vendor-reported attack patterns covered in Anthropic’s first AI-orchestrated cyber espionage campaign report.

Routing also becomes harder when multiple agents share the same backend or when one agent brokers requests for another, because trust chains get opaque very quickly. In those cases, security teams should treat every hop as a new authorisation event, not as a continuation of the original session.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Model routing can expand tool use and privilege unexpectedly in agentic systems.
CSA MAESTRO	TRT-1	MAESTRO focuses on threat-informed controls for agent orchestration and decisions.
NIST AI RMF		AI RMF governance applies to accountability and risk decisions in routing.

Use policy-as-code to evaluate each routing decision against task, identity, and data sensitivity.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams govern model routing in AI agent workflows?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group