LiteLLM vs Pomerium: access control gaps in AI gateway stacks

By NHI Mgmt Group Editorial TeamPublished 2025-08-19Domain: Agentic AI & NHIsSource: Pomerium

TL;DR: Multi-model routing is simplified by LiteLLM, while identity-aware access control, audit logging, and context-based policy enforcement are added around LLM gateways and MCP-connected services, according to Pomerium. The real issue is not model abstraction but whether AI access is governed with the same identity controls as other production services.

At a glance

What this is: This comparison separates LLM routing from identity-aware access control and shows why those are different layers of the AI stack.

Why it matters: It matters because IAM, PAM, and NHI teams need to control who can reach AI gateways, what context they arrive with, and how those sessions are logged and reviewed.

By the numbers:

When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes.
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.

👉 Read Pomerium's comparison of LiteLLM and identity-aware access control

Context

AI gateway stacks are now part of the identity control plane, not just an application integration layer. When LLM access is brokered through shared endpoints, the real governance question becomes whether authentication, authorisation, and audit are enforced at the point of use rather than assumed elsewhere.

LiteLLM and Pomerium sit at different layers of that problem. One normalises model access across providers, while the other applies identity-aware policy to the people and systems reaching those services, including MCP-connected workflows and other HTTP-based AI endpoints.

Key questions

Q: How should security teams govern AI gateway access in enterprise environments?

A: Security teams should govern AI gateway access with contextual authorisation, short-lived credentials, and full session logging. The gateway may simplify model integration, but identity policy must decide who can use it, from where, and under which conditions before the request reaches downstream models or tools.

Q: Why do LLM gateways create more governance risk than a normal API proxy?

A: LLM gateways can route traffic to multiple models, data sources, and tool chains through a single session path. That makes them part of the identity control plane, so a generic proxy model is too weak unless it also evaluates identity context and logs privileged actions.

Q: What do security teams get wrong about securing MCP-connected AI workflows?

A: Teams often focus on the model while ignoring the delegated access path to internal services. MCP-connected workflows can reach sensitive systems through AI-mediated sessions, so they need the same access review, credential scope, and audit discipline as other privileged integrations.

Q: What is the difference between an LLM gateway and identity-aware access control?

A: An LLM gateway normalises how applications reach models, while identity-aware access control decides whether the request should be allowed at all. They solve different problems, and treating them as interchangeable leaves policy enforcement too close to the application layer.

Technical breakdown

LLM gateway routing versus access control

An LLM gateway abstracts model providers behind a single API surface, which helps applications route requests, switch vendors, and collect usage data without rewriting code. That solves integration complexity, but it does not by itself decide who may reach the gateway, from which device, at what time, or under what identity assurance. Access control belongs in a separate enforcement layer that can evaluate context before a request is allowed through. In identity terms, routing and authorisation are related but not interchangeable. The former chooses where traffic goes. The latter determines whether the request is allowed to exist in the first place.

Practical implication: Separate model abstraction from policy enforcement so that gateway convenience does not become an access-control blind spot.

Identity-aware policy at the AI edge

Identity-aware access proxies enforce policy using signals such as user identity, device posture, time, and group membership. That matters for AI gateways because LLM requests can expose data, trigger tools, or reach internal services that should not be reachable through a generic reverse proxy. A policy engine at the edge can log full session context, attach identity metadata, and reject requests that fail contextual checks. This is especially important when users reach AI services through browser sessions, internal portals, or MCP-connected tools, where the access path is often more complex than a simple API call.

Practical implication: Use contextual authorisation for AI access paths that can reach sensitive data or operational systems.

Why MCP-connected services raise the bar

Model Context Protocol extends an AI workflow beyond plain prompt-response traffic by connecting models to tools and data sources. Once MCP servers are in the path, the access problem is no longer just model usage. It becomes delegated access to internal systems through an AI-mediated workflow. That increases the need for short-lived credentials, auditability, and policy enforcement that can distinguish a legitimate user session from an overbroad or unmanaged integration. Without that boundary, the same gateway that simplifies access can also simplify misuse.

Practical implication: Treat MCP-connected AI workflows as privileged integrations and govern them with the same care as other high-risk service paths.

NHI Mgmt Group analysis

LLM routing and identity enforcement are different control problems. A gateway that normalises model APIs reduces developer friction, but it does not answer the governance question of who is allowed to use the gateway under what conditions. Pomerium's framing exposes a common architecture mistake: organisations treat model aggregation as if it were access governance. The result is policy drift between the application layer and the identity layer. Practitioners should stop assuming that an LLM endpoint is secure because it is convenient to integrate.

AI access policies need to follow the session, not just the account. Context-aware controls matter because AI use is situational, not static. A developer, service team, or analyst may all be authorised differently depending on device, network, time, and target system. That makes the control model closer to PAM and ZTA than to a simple API key gate. The governance lesson is that AI gateways become part of the access fabric, so identity context must be enforced at request time, not only at provisioning time.

AI gateway access control gap: the common assumption is that model access can be governed like any other internal API, with a single credential and a static allow list. That assumption fails when the gateway can reach multiple models, internal data sources, and MCP servers through the same session path. The implication is that practitioners must rethink where authorisation actually happens and how much trust they place in abstraction layers.

Short-lived credentials and full-context logging are now baseline expectations for AI infrastructure. The article points in the right direction by emphasising short-lived credentials, dynamic policy, and audit trails. Those controls are not nice-to-haves once LLM requests can access sensitive data or trigger downstream actions. The broader identity governance signal is that AI access must be reviewable, attributable, and time-bounded in the same way other privileged workflows are.

This category is converging on a zero-trust pattern for AI services. The practical future is not a single universal gateway, but a stack in which model abstraction, identity enforcement, and auditability are deliberately separated. That will force IAM, NHI, and security architecture teams to decide which layer owns the trust decision and which layer only moves traffic. Practitioners should map that responsibility before AI usage spreads further into production workflows.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation, according to AI Agents: The New Attack Surface report.
For a wider view of the control problem, see OWASP NHI Top 10 for the agentic risk categories practitioners are now mapping into policy.

What this signals

AI gateway governance is becoming a control-plane problem, not a tooling preference. As more AI systems sit behind shared access layers, teams will need to decide which layer owns the trust decision and which layer only brokers traffic. The practical shift is toward separated policy, audit, and routing responsibilities, with identity signals enforced at request time rather than assumed from the application tier.

With 96% of technology professionals already identifying AI agents as a growing security threat, the governance baseline has moved. That makes identity-aware access controls, session context, and reviewable audit data the minimum for production AI paths, especially where tools can reach internal systems or MCP-connected services.

Gateway abstraction without governance will not scale. Organisations that centralise model access but leave authorisation fragmented will accumulate hidden privilege paths across teams and workflows. The next step is to align AI access with existing zero-trust and privileged-access patterns, then extend those controls to the model, the tool, and the session together.

For practitioners

Separate model abstraction from authorisation. Place the LLM gateway and the access enforcement layer under different control objectives so that routing, authentication, and policy evaluation can be managed independently.
Require contextual checks on every AI session. Use identity, device posture, time, and group membership to decide whether a request can reach the gateway or an MCP-connected service.
Treat MCP-connected tools as privileged integrations. Review every AI workflow that can reach internal systems, then apply short-lived credentials and logged approvals where tool access crosses trust boundaries.
Design audit trails for review, not just retention. Capture full identity and session metadata so investigators can reconstruct who accessed what, through which gateway, and under which policy decision.

Key takeaways

LLM routing and identity enforcement solve different problems, and conflating them leaves AI access under-governed.
Shared AI gateways increase the need for contextual policy, short-lived credentials, and complete audit trails.
MCP-connected workflows should be treated as privileged integrations because they can bridge model access into internal systems.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		The article discusses AI gateways, tool access, and agent-adjacent policy boundaries.
OWASP Non-Human Identity Top 10	NHI-03	Short-lived credentials and access scope are central to securing non-human AI access paths.
NIST Zero Trust (SP 800-207)	PR.AC-4	Context-aware access decisions align directly with zero-trust authorization principles.

Review gateway credentials and session scope under NHI-03, then replace standing secrets where possible.

Key terms

LLM Gateway: An LLM gateway is a control layer that normalises access to multiple model providers through a single API surface. It simplifies routing, failover, and usage tracking, but it does not automatically provide identity assurance or contextual authorisation for the requests that pass through it.
Identity-Aware Access Proxy: An identity-aware access proxy sits in front of a service and evaluates who is asking, from what device, and under which policy before allowing access. In AI environments, it can protect model endpoints and connected tools by enforcing context-based decisions at request time.
MCP-Connected Workflow: An MCP-connected workflow is an AI-mediated path that uses the Model Context Protocol to reach tools or data sources beyond the model itself. That expands the governance problem from prompt handling to delegated access, because the request can now touch internal systems through a session path.
Contextual Authorisation: Contextual authorisation is the practice of making access decisions using identity and runtime signals such as device posture, location, time, and session risk. It is especially important where AI services can reach data or actions that should not be available through static API credentials alone.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Pomerium: LiteLLM vs. Pomerium: What's the Difference and Which One Do You Need? Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-08-19.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org