MiniMax M2.5 shows why model routing now drives AI governance

By NHI Mgmt Group Editorial TeamPublished 2026-03-17Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: MiniMax M2.5 is now the most-used model on OpenRouter by token volume, processing over 2.45 trillion tokens in a week as of February 2026, while Chinese AI models reached 61% market share and the model delivered near frontier performance at far lower cost, according to WorkOS. The governance issue is no longer model quality alone but routing, escalation, and control boundaries across AI agent workloads.

At a glance

What this is: This is an analysis of why MiniMax M2.5 is becoming the default chat-frontline model for AI agent workflows and what its speed, cost, and routing profile change for practitioners.

Why it matters: It matters because model routing now shapes access boundaries, escalation paths, and cost controls across autonomous, NHI, and human-facing programmes, not just AI performance.

By the numbers:

MiniMax M2.5 processes more tokens on OpenRouter than any other model, with over 2.45 trillion tokens processed in a single week as of February 2026.
Chinese AI models now hold 61% of OpenRouter's market share.
M2.5 scores 80.2% on SWE-Bench Verified, compared with 80.8% for Claude Opus 4.6.

👉 Read WorkOS's analysis of MiniMax M2.5 and AI model routing

Context

Model routing is now an identity governance problem as much as a cost problem. When one model handles most conversational traffic, the control question becomes who or what is allowed to decide when to stay on the cheap path and when to escalate to a higher-trust model. In AI agent programmes, that routing decision changes the access surface, the audit trail, and the business risk.

This article is about a frontier model being used as the default fronting layer for an always-on agent, not about a vendor launch. The useful takeaway for IAM teams is that the routing layer is becoming a policy layer, and policy layers need ownership, boundaries, and review just like any other privileged control plane.

Key questions

Q: How should security teams govern model routing in AI agent workflows?

A: Security teams should treat model routing as a policy decision, not a performance shortcut. Define which requests stay on the front-line model, which must escalate, and which are blocked entirely. Tie those rules to data sensitivity, tool access, and audit logging so that routing decisions are reviewable and consistent across environments.

Q: Why does a cheap front-line model change IAM risk for AI systems?

A: A cheap front-line model makes it easier to route more work through a single decision layer, which concentrates access and data handling. That does not create identity risk by itself, but it increases the impact of weak escalation logic, especially when the model can trigger tools or pass outputs into other systems.

Q: What breaks when escalation from one model to another is implicit?

A: Implicit escalation breaks governance because teams cannot prove when a request crossed from low-risk handling into higher-trust reasoning. Without explicit criteria, routing drifts over time, approvals become inconsistent, and incident review cannot reconstruct why a sensitive task moved to a different model.

Q: How do organisations decide between self-hosted open-weight models and hosted APIs?

A: Organisations should decide based on control requirements, not just price. Self-hosting improves configurability and availability control, but it also shifts responsibility for access management, logging, patching, and lifecycle governance onto the team operating the model path.

Technical breakdown

Mixture of experts routing and why it lowers inference cost

MiniMax M2.5 uses a mixture of experts architecture, which means the model has a large total parameter count but activates only a subset during each inference pass. That design lets it keep broad capability while reducing the compute needed per token. For practitioners, the important point is not model novelty but the operational effect: lower latency and lower cost make it easier to place the model in the hot path for chat, tool calls, and agent response handling. Once a model becomes the default execution layer, its behaviour shapes downstream access decisions.

Practical implication: Treat the model choice as part of the access control design, not as a purely performance decision.

Tiered model routing and escalation boundaries

The article describes a tiered model strategy in which one model handles routine conversational work, another handles deep reasoning, and a third handles implementation. That is effectively a policy engine for AI work distribution. The architectural risk is routing drift, where the cheaper front line begins absorbing tasks that should have escalated, or where escalation rules are implicit rather than enforced. In identity terms, the routing layer decides which actor receives the request, which tools it may invoke, and when higher privilege is justified. Those decisions are governance decisions even when they are expressed as model selection.

Practical implication: Define explicit escalation criteria for when low-cost model handling must hand off to a higher-trust model.

Open weights, local deployment, and control ownership

Open-weight models change the governance model because the organisation can run, inspect, and tune the model on its own infrastructure. That removes some vendor dependency but shifts responsibility inward. Availability becomes a programme decision, not a service promise. Auditability also improves only if logging, access boundaries, and release management are in place around the deployment. For AI agent programmes, open weights are not a free pass. They expand administrative control while increasing the need for identity, workload, and runtime governance around the systems that host and call the model.

Practical implication: If you self-host, extend NHI controls to the runtime, the orchestration layer, and the model access path.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Routing layers are becoming identity control points, not just model optimization layers. Once an AI system decides which model handles each task, it is making an access decision with governance consequences. That decision determines cost, latency, data exposure, and whether a request is escalated into a higher-trust workflow. For IAM teams, the routing layer now sits close enough to privilege boundaries that it must be reviewed like any other control plane.

The assumption that one frontier model should handle most work is breaking under operational economics. The article shows that routine requests can be absorbed by a cheaper front-line model while harder cases are escalated. That changes the programme design question from model capability to policy design. Teams need to decide where decision authority lives in the stack and who owns escalation logic across agentic workflows.

Open-weight model adoption shifts risk from external dependency to internal governance debt. When an organisation can run and tune a model locally, it gains control over availability and configuration but inherits responsibility for audit, lifecycle management, and access boundaries. That means the identity problem moves from vendor trust to platform discipline, especially where AI agents consume model output and pass it into tools.

Model routing blast radius: A front-line model that handles most requests can concentrate access, data flow, and decision-making in one policy surface. That concentration is efficient, but it also means a weak routing rule can affect every downstream tool call and escalation path. Practitioners should read this as a governance pattern, not a model benchmark story.

AI agent programmes will increasingly look like identity orchestration systems. The agent does not just ask for an answer. It selects a path, escalates when needed, and may act on the response through connected tools. That makes model routing part of the lifecycle of an AI identity, especially where the same agent moves between low-risk conversation and high-risk execution. Governance must follow the path, not the label.

From our research:
28.65 million new hardcoded secrets were detected in public GitHub commits in 2025 alone, a 34% year-over-year increase and the largest single-year jump ever recorded, according to the State of Secrets Sprawl 2026.
AI-related credential leaks surged 81.5% year-over-year in 2025, with the surrounding AI infrastructure leaking 5x faster than core LLM providers.
Analysis of Claude Code Security shows how AI-assisted workflows change the credential exposure surface and why routing controls must extend into developer and agent pipelines.

What this signals

Model routing is becoming a security boundary: as AI systems decide which model gets which task, the organisation is effectively delegating trust decisions to the orchestration layer. That makes routing logs, escalation rules, and prompt-to-tool traces part of the control evidence set, especially where the same agent can shift from chat to action. Practitioners should align this with the NIST AI Risk Management Framework and treat model selection as governed behaviour, not a convenience feature.

With 80.2% SWE-Bench Verified performance and a large cost gap to frontier models, the pressure to route more work through a cheap front line will keep rising. The practical response is to define where low-cost handling ends, where higher-trust reasoning begins, and how those transitions are approved and audited. That is a programme design issue, not a model preference.

Model routing blast radius: when one model becomes the default conversational layer, small governance mistakes can scale across every downstream tool call. The same pattern is emerging across AI agents, service identities, and human support paths, which is why identity teams should review orchestration as part of their broader zero trust programme, not as an isolated AI project.

For practitioners

Map routing decisions to governance owners Assign a named control owner for when the front-line model may answer directly and when it must escalate to a higher-trust model. Include data sensitivity, action scope, and tool access in the rule set. Use the routing decision as an auditable control, not an implicit design choice.
Set escalation thresholds for agent workflows Document which tasks are safe for the low-cost model and which require specialist reasoning, especially where tool use or external side effects are possible. Review the thresholds after incidents, major prompt changes, or new integrations.
Extend NHI controls to self-hosted model paths If you run open-weight models locally, protect the model endpoint, orchestration layer, and service credentials with the same discipline used for other non-human identities. That includes access review, secret handling, and change control around deployment.
Monitor routing drift in production traces Look for cases where routine prompts start reaching higher-cost or higher-trust models without a clear policy reason. Routing drift usually appears first as cost inflation, then as inconsistent approvals, then as unmanaged access to sensitive workflows.

Key takeaways

MiniMax M2.5 is less a model story than a routing story, because default handling decisions now shape access, auditability, and escalation.
The article's numbers show why front-line model selection is a governance issue: cheap, fast routing scales quickly and can hide weak escalation logic.
Practitioners should control the model path the same way they control other privileged workflows, with explicit ownership, logging, and review.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Model routing and escalation logic affect agent tool access and decision boundaries.
NIST AI RMF	GOVERN	AI routing and ownership are governance issues, not just model-selection decisions.
NIST Zero Trust (SP 800-207)	PR.AC-4	The routing layer concentrates access decisions and needs least-privilege discipline.

Document when low-trust model paths must escalate before tool use or sensitive action.

Key terms

Model Routing Layer: The model routing layer is the policy and orchestration logic that decides which model handles a request, when to escalate, and what tools the request can reach. In AI programmes, it behaves like a control plane because it shapes data exposure, privilege boundaries, and auditability.
Escalation Threshold: An escalation threshold is the rule that determines when a request should move from a lower-cost or lower-trust model to a more capable one. It is a governance control, not a performance tweak, because it sets when higher-risk reasoning or action is permitted.
Routing Drift: Routing drift is the gradual shift of requests away from the intended model path, often because policy is implicit, exceptions accumulate, or cost pressure overrides design intent. It creates governance debt by making access, reasoning depth, and tool use harder to predict and audit.
Open-Weight Deployment: An open-weight deployment uses a model whose weights can be downloaded and run on infrastructure the organisation controls. That improves configurability and availability control, but it also transfers responsibility for access management, logging, patching, and lifecycle governance to the operator.

Deepen your knowledge

Model routing and escalation governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building AI agent controls around self-hosted or tiered model paths, it is worth exploring.

This post draws on content published by WorkOS: Why MiniMax M2.5 is the most popular model on OpenRouter right now. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-17.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org