The production phase where a trained model is exposed through an application or endpoint so it can respond to requests. In identity terms, serving introduces a distinct trust boundary because the runtime may access specialised compute, model artefacts, and upstream systems.
Expanded Definition
Model serving is the runtime layer that turns a trained model into an operational dependency: a request arrives, the service loads model artefacts, applies policy, and returns a prediction or action. In NHI terms, serving is not just deployment. It creates a live trust boundary around the model endpoint, the service account or Ultimate Guide to NHIs context, upstream APIs, and any secrets used by the inference stack.
Definitions vary across vendors because some treat serving as a narrow inference endpoint, while others include routing, batching, guardrails, and observability around the model. NIST’s NIST Cybersecurity Framework 2.0 is useful here because it frames the operational controls that surround the endpoint even when the AI terminology itself is still evolving. The practical distinction is that training produces the model, but serving exposes it to real users, agents, and downstream systems with production privileges.
The most common misapplication is treating model serving as a simple application deployment, which occurs when teams ignore runtime identity, secrets, and access boundaries that only exist once inference is exposed externally.
Examples and Use Cases
Implementing model serving rigorously often introduces latency, scaling, and governance overhead, requiring organisations to weigh faster responses against tighter access control, observability, and change management.
- A customer support agent calls a hosted classifier through an API gateway, with a distinct service identity, short-lived credentials, and request logging tied to policy enforcement.
- An internal fraud model serves predictions to a payment workflow, where the serving layer can only reach approved data sources and the model artifact is version-pinned for auditability.
- A retrieval-augmented assistant uses a serving endpoint to score prompts before generation, and the runtime is isolated so tool access is limited to the minimum required scope.
- A regulated workload uses blue-green model rollout, where serving versions are compared in production to reduce drift and to support rollback without exposing the full estate.
- As described in the Ultimate Guide to NHIs, service identities that support model APIs should be governed like any other production machine identity, especially when they can reach secrets managers or customer data. This aligns with the control logic in the NIST Cybersecurity Framework 2.0 around protected access and monitored runtime behavior.
Why It Matters in NHI Security
Model serving is where AI systems become operationally powerful and operationally risky. If the serving layer is over-privileged, poorly segmented, or allowed to hold long-lived secrets, it can become the easiest path to data exposure, model tampering, or unauthorized tool use. That is why NHI governance applies here: the service account, API key, certificate, and orchestration identity are all non-human identities that must be provisioned, reviewed, and retired with discipline.
The scale of the problem is often underestimated. NHI Mgmt Group research shows that only 5.7% of organisations have full visibility into their service accounts, which means many serving environments operate with blind spots that make incident response slow and incomplete. The same guidance in the Ultimate Guide to NHIs also reinforces why model serving should be mapped to lifecycle control, rotation, and access review rather than treated as a one-time deployment task. In governance terms, the endpoint should be considered part of the production identity perimeter, and its access model should be compatible with NIST Cybersecurity Framework 2.0 outcomes for protected assets and monitored operations.
Organisations typically encounter the consequences of weak model serving only after a prompt injection, credential leak, or data exfiltration event, at which point model serving becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | Agentic systems depend on governed serving endpoints that constrain tool use and runtime authority. | |
| NIST CSF 2.0 | PR.AC-4 | Serving requires access control and least-privilege protection for production identities and endpoints. |
| NIST Zero Trust (SP 800-207) | SC-7 | Model serving fits Zero Trust segmentation and continuous verification at the runtime boundary. |
Restrict agent runtime permissions, log tool calls, and isolate serving from sensitive backends.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 6, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org