LLM routing is the runtime decision layer that sends each request to the most suitable model based on cost, complexity, latency, and risk. In governed environments, it becomes part of the control plane because it determines which provider receives the data and which audit trail must exist.
Expanded Definition
LLM routing is the policy-driven decision point that selects which model handles a request, often using signals such as prompt length, task type, token budget, latency targets, confidence thresholds, and content sensitivity. In NHI and agentic AI environments, routing is not just an optimisation feature. It is part of the control plane because it changes which model provider receives the input, which secrets or personal data may be exposed, and which logs must exist for audit and incident response.
Definitions vary across vendors, especially when routing is bundled with model cascades, fallback logic, guardrails, or agent orchestration. The operational question is not simply "which model is cheapest," but "which model is allowed to see this request under current policy." That distinction aligns with guidance in the NIST AI Risk Management Framework and the OWASP Agentic AI Top 10, both of which treat selection, exposure, and oversight as governance concerns, not only engineering choices.
Routing is often confused with model selection at design time, but runtime routing must react to live context, policy state, and risk posture. The most common misapplication is allowing cost-based routing to override data-handling rules, which occurs when sensitive prompts are sent to lower-cost external models without a documented approval path.
Examples and Use Cases
Implementing LLM routing rigorously often introduces added latency, policy complexity, and audit overhead, requiring organisations to weigh lower inference cost against stronger data control.
- A support chatbot routes routine FAQs to a small model, but escalates customer complaints containing account data to a higher-assurance model with stricter retention controls.
- An agentic coding tool sends low-risk refactoring tasks to one model, while security-sensitive prompts involving credentials are blocked or redirected to an approved internal model, a pattern reflected in the Analysis of Claude Code Security.
- A procurement assistant uses a fast model for classification, then routes legal clauses to a model configured for higher context retention and more detailed logging.
- An enterprise gateway routes prompts containing regulated data away from public providers and into a private deployment, while preserving the decision trail for later review.
- Security teams study breach patterns such as the AI LLM hijack breach alongside the NIST AI 600-1 Generative AI Profile to define when routing must fail closed rather than silently downgrade.
In mature environments, routing decisions are often tied to policy engines, DLP checks, and identity context so that the model choice reflects business risk rather than convenience alone.
Why It Matters in NHI Security
Routing is a security boundary because it governs which non-human identity, provider, and execution path touch the data. If the routing layer is misconfigured, an NHI can send privileged prompts, embedded secrets, or regulated content to a model that was never approved for that workload. That creates exposure across confidentiality, auditability, and provider trust boundaries. NHIMG research on LLMjacking: How Attackers Hijack AI Using Compromised NHIs shows how quickly exposed credentials can be abused, with attackers attempting access in an average of 17 minutes after AWS credentials are public. The same pattern appears when routing logic is weak: attackers do not need to defeat the model if they can influence the path that reaches it.
Routing also affects detection. If one path lacks logs, redaction, or identity binding, incident responders may not know which model saw the prompt or which downstream systems were contacted. That is why NHI governance must align routing with policy, telemetry, and access control, not treat it as a mere optimisation feature. The most serious failures are often only discovered after data exposure, when an organisation is forced to reconstruct which model received the request and why the control plane allowed it.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST AI 600-1 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | N/A | Routing choices shape model exposure, tool access, and agent trust boundaries. |
| NIST AI RMF | Requires governance of AI system risk, including model selection and downstream impact. | |
| NIST AI 600-1 | Profiles generative AI use cases where prompt handling and exposure controls matter. |
Bind routing policy to data sensitivity so each request reaches only an approved model path.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 20, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org