Subscribe to the Non-Human & AI Identity Journal

What do teams get wrong about open-source LLM gateways?

They often assume that API compatibility implies security completeness. In practice, open-source gateways may still require separate work for authentication, policy enforcement, logging, and credential management. If those controls are not designed in, the organisation gets abstraction without reliable governance.

Why This Matters for Security Teams

Open-source LLM gateways are often adopted as a fast path to standardise access to models, but compatibility is not the same as control. A gateway can proxy requests and normalise APIs while still leaving authentication, policy enforcement, audit logging, and secret handling to separate components. That split is where teams lose governance, especially when multiple applications, service accounts, and model providers are involved.

The security problem is not the gateway itself, but the assumption that a routing layer automatically becomes a control plane. Current guidance from the OWASP Top 10 for Agentic Applications 2026 and NIST AI Risk Management Framework makes that distinction explicit: runtime visibility, access control, and accountability must be designed, not implied. NHI Management Group research on the AI Agents: The New Attack Surface report shows why this matters operationally, with 80% of organisations reporting AI agents have already acted beyond their intended scope. In practice, many security teams discover gateway gaps only after secrets, prompts, or downstream API access have already been exposed, rather than through intentional control validation.

How It Works in Practice

A secure gateway deployment needs to behave like a policy enforcement point, not just an API translator. That means the gateway should validate identity, inspect request context, log decisions, and enforce allowlists or denial rules before traffic reaches the model or downstream tools. For AI workloads, this is especially important because the same application may call multiple models, chain tools, and invoke external APIs in ways that are hard to predict in advance.

In practice, teams should separate four layers:

  • Authentication of the caller, whether user, service, or agent workload identity.
  • Authorisation based on request context, not only static roles.
  • Secrets management for model keys, backend tokens, and tool credentials.
  • Telemetry that records prompts, policy decisions, and sensitive action attempts.

That design aligns with current patterns discussed in the CSA MAESTRO agentic AI threat modeling framework and the OWASP NHI Top 10, both of which emphasise that agentic and AI-adjacent systems need runtime controls around identity, authorization, and tool use. An open-source gateway can support this pattern, but it rarely provides it out of the box. Teams still need to wire policy engines, rotate secrets, and ensure the gateway does not become a single point of credential sprawl. The LLMjacking threat analysis is a reminder that exposed credentials are actively exploited, often within minutes once discovered. These controls tend to break down in multi-tenant environments where each tenant expects custom policies but the gateway is configured with shared backend credentials and inconsistent audit paths.

Common Variations and Edge Cases

Tighter gateway control often increases integration overhead, requiring organisations to balance fast model access against governance friction. That tradeoff becomes visible when teams run local models, hybrid clouds, or multiple open-source gateways across product lines. Best practice is evolving, but there is no universal standard for how much policy should live in the gateway versus adjacent identity, logging, and secrets systems.

Some teams use the gateway only for routing and rate limiting, while others expect it to enforce content policy, prompt filtering, tenant isolation, and outbound tool restrictions. The latter can work, but only when the gateway is paired with workload identity, just-in-time credentials, and a real-time policy engine. For agentic or tool-using systems, static allowlists are often too blunt because the request path changes as the agent reasons and acts. That is why guidance from the NIST AI Risk Management Framework and the Ultimate Guide to NHIs points toward short-lived credentials and auditable workload identity rather than long-lived shared keys.

The main edge case is legacy software that cannot pass identity context cleanly through the gateway, or environments where the gateway sits outside the trust boundary and cannot make authoritative decisions. In those settings, the gateway is useful, but it is not the control plane. Teams that treat it as one usually inherit abstraction without enforcement.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A1 Open-source gateways fail when auth and policy are assumed, not enforced.
CSA MAESTRO T1 MAESTRO addresses agentic threat paths across identity, tools, and policy.
NIST AI RMF AIRMF covers governance, accountability, and runtime risk controls for AI systems.

Assign ownership, monitor decisions, and validate that gateway controls are operating as intended.