How should organisations govern model selection for AI agents?

Why This Matters for Security Teams

Model selection is not just a product choice for AI agents. Different foundation models bring different safety properties, tool-use behaviours, data handling characteristics, and external exposure patterns, so letting teams pick models ad hoc can turn one approved workflow into several unreviewed risk profiles. That is especially true when agents have execution authority, can chain tools, and can reach sensitive systems. Current guidance from the NIST AI Risk Management Framework and NHIMG research on OWASP NHI Top 10 points to governance at the model layer because the model itself influences how risk propagates through the agent stack.

Security teams often miss that model swaps can change more than answer quality. They can alter prompt sensitivity, tool invocation patterns, refusal behaviour, logging posture, and whether data crosses jurisdictional or contractual boundaries. The practical issue is not only whether the model is “good,” but whether it is approved for the agent’s intended autonomy, data class, and external connectivity. In practice, many security teams encounter model-risk drift only after an agent has already been deployed with a locally chosen model that bypassed review.

How It Works in Practice

Effective governance starts with a central model catalogue that defines which models are approved, restricted, or prohibited for specific agent use cases. That catalogue should be tied to business context: internal-only summarisation, code generation, customer-facing interaction, regulated data processing, and high-autonomy tool execution should not share the same approval path. The policy decision should happen before deployment and again at runtime if the agent can route requests across multiple models.

Practitioners should evaluate models on more than benchmark scores. A useful review includes:

data retention and training-use terms, including whether prompts or outputs may be retained by the provider;

tool-use behaviour, especially how the model handles function calling and action confirmation;

support for audit logging, policy controls, and tenant isolation;

regional hosting and contractual restrictions for regulated workloads;

known jailbreak, prompt-injection, or delegation weaknesses for agentic use.

For agent workloads, governance should also distinguish between model classes. A model that is acceptable for low-risk internal drafting may be unsuitable for autonomous actions that touch secrets, identity systems, or external APIs. NHIMG’s guidance on the AI LLM hijack breach and the Top 10 NHI Issues shows why model choice must be evaluated alongside identity, credential scope, and downstream authority. Best practice is evolving, but the direction is clear: model approval needs policy-as-code, a review workflow, and traceable exceptions rather than informal team preference.

This approach aligns with the OWASP Agentic AI Top 10 and the CSA MAESTRO agentic AI threat modeling framework, both of which treat model behaviour as part of the threat surface. These controls tend to break down when teams allow direct model selection inside production agent builders because governance exceptions proliferate faster than review capacity.

Common Variations and Edge Cases

Tighter model approval often increases operational friction, requiring organisations to balance speed and experimentation against control and traceability. That tradeoff becomes visible in environments that need rapid prototyping, multilingual support, or region-specific deployment, where one model may not satisfy every constraint.

There is no universal standard for model tiering yet, so organisations usually adopt one of three patterns: a single approved model list, a tiered list by data sensitivity, or a use-case matrix that maps model classes to allowed agent actions. The last option is usually strongest for autonomous systems because it prevents a high-capability model from being used with high-risk privileges simply because it was technically available.

Edge cases need explicit handling. Open-source models hosted internally may reduce third-party data exposure but can increase operational burden around patching, telemetry, and evaluation. Frontier models may offer better safety tooling but still require restrictions on customer data, secrets, and external tool use. Multi-model agents also deserve extra scrutiny because routing logic can silently move a request from an approved model to a less controlled one. That is why model provenance, routing policy, and auditability should be reviewed together, not separately. For additional context on governance and lifecycle controls, see Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs and Ultimate Guide to NHIs — Regulatory and Audit Perspectives.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Addresses unsafe agent design choices, including model-driven behaviour and misuse.
CSA MAESTRO	GOV-02	Governance control for defining approved model use across agent workloads.
NIST AI RMF		AI RMF governs risk assessment, measurement, and oversight for model selection.

Classify models by agent risk and prohibit higher-risk models from autonomous actions.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should organisations govern model selection for AI agents?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group