Organisations should decide based on control requirements, not just price. Self-hosting improves configurability and availability control, but it also shifts responsibility for access management, logging, patching, and lifecycle governance onto the team operating the model path.
Why This Matters for Security Teams
The self-hosted versus hosted API choice is really an operating-model decision about who owns trust boundaries, secrets, patching, logging, and incident response. Hosted APIs reduce infrastructure burden, but they also constrain observability and can complicate data residency, retention, and vendor risk reviews. Self-hosted open-weight models give teams more control over deployment and telemetry, yet that same control means the security team must harden the model path like any other production workload.
That matters because model access is part of the NHI problem set, not a separate AI convenience layer. When model endpoints, service accounts, tokens, and orchestration jobs are weakly governed, the risk looks similar to other identity failures: broad permissions, poor rotation, and missed offboarding. The pattern is consistent with what NHI Mgmt Group has documented across identity ecosystems, including the JetBrains GitHub plugin token exposure case, where exposed credentials can quickly become a supply-chain issue. NIST’s NIST Cybersecurity Framework 2.0 is useful here because it frames the decision around governance, protection, detection, and recovery rather than deployment preference alone.
In practice, many security teams discover the real cost of “cheap” model access only after a token leak, audit finding, or outage has already forced a redesign.
How It Works in Practice
A practical decision starts with the data and identity controls the organisation needs to enforce. If the workload needs strict access boundaries, detailed logging, custom filters, or region-specific hosting, self-hosting may be justified. If the main requirement is rapid adoption with lower operational overhead, hosted APIs can be the better fit, provided the vendor’s controls satisfy legal, privacy, and resilience requirements.
For self-hosted models, the team should treat the model endpoint as a privileged service: issue short-lived credentials, restrict them with RBAC and, where needed, JIT provisioning, and log every call path that can reach the model. That includes the app, CI/CD jobs, retrieval layers, and any tool-calling middleware. NHI research from JetBrains GitHub plugin token exposure shows how quickly one exposed secret can become a wider operational incident, which is why secret sprawl and key rotation matter as much for model services as they do for code repositories.
For hosted APIs, the security team should verify how prompts, outputs, and telemetry are retained, whether tenant isolation is contractual or technical, and how API keys are rotated and revoked. Current guidance suggests using policy-as-code at the gateway layer so requests are screened before they reach the model, with controls mapped to NIST Cybersecurity Framework 2.0 governance and protection outcomes. Where teams are building AI-driven services, the OWASP and CSA guidance on identity, tool access, and runtime controls should be applied together, not selectively.
- Choose hosted APIs when the priority is speed, vendor-managed patching, and predictable scaling.
- Choose self-hosted models when auditability, data control, or custom policy enforcement are non-negotiable.
- Require workload identity, short-lived secrets, and explicit logging for either path.
- Document who owns patching, incident response, and offboarding before the model goes live.
These controls tend to break down when teams embed the model in many internal tools without a single owner, because access paths multiply faster than governance can keep up.
Common Variations and Edge Cases
Tighter model control often increases operational overhead, so organisations have to balance sovereignty and observability against staffing, reliability, and time-to-market. That tradeoff is especially visible in regulated environments, where a self-hosted path may satisfy internal control objectives but still fail if the team cannot sustain patching, telemetry review, and credential rotation.
There is no universal standard for this yet, but current guidance suggests using hosted APIs for lower-risk use cases and self-hosting when the model becomes part of a sensitive workflow or critical business process. A common edge case is hybrid deployment: a hosted model for experimentation and a self-hosted model for production data. Another is air-gapped or restricted environments, where self-hosting is often the only realistic option, but only if the organisation can operate it with mature NHI discipline. That means least privilege, clear service ownership, and regular review of secrets, because model paths often inherit the same failure patterns seen in broader identity estates. NHI Mgmt Group’s research has repeatedly shown that organisations lose track of service accounts and tokens once systems are fragmented, and that lesson applies directly to model endpoints as well.
For practitioners, the most important test is not “Which is cheaper?” but “Which option can the organisation govern end to end?” If the answer is uncertain, the safer choice is usually the one that matches the team’s existing identity and operations maturity rather than the most feature-rich model path.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Non-Human Identity Top 10 | NHI-03 | Credential rotation is central when models use API keys and service accounts. |
| NIST CSF 2.0 | PR.AC-4 | Access control governs who can reach hosted or self-hosted model endpoints. |
| NIST AI RMF | AI governance is needed to manage risk across model deployment choices. |
Assign ownership for model risk, logging, and incident response under a formal AI risk process.
Related resources from NHI Mgmt Group
- How should security teams decide between WinRM over HTTP and HTTPS?
- How do security teams decide between Layer 2 and Layer 3 encryption?
- When should organisations prioritise Zero Standing Privilege for non-human identities?
- How should security teams decide whether JIT access is safe for non-human identities?
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 6, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org