AI gateway benchmarks show why API governance now matters for agents

By NHI Mgmt Group Editorial TeamPublished 2025-07-07Domain: Agentic AI & NHIsSource: Kong

TL;DR: APIs are now the connective tissue for GenAI and agentic workflows, and Kong’s benchmark reports that its AI Gateway outperformed Portkey and LiteLLM on throughput and latency in a controlled EKS test, according to Kong. Performance matters, but identity governance and policy enforcement remain the deciding factors as AI usage moves into production.

At a glance

What this is: This is Kong’s benchmark of AI gateway performance across Kong, Portkey, and LiteLLM, with the key finding that gateway throughput and latency are now tightly coupled to production AI governance.

Why it matters: For IAM teams, the message is that AI gateways are becoming an identity control point for NHI, agentic AI, and API traffic, not just a routing layer.

By the numbers:

Kong Konnect Data Planes showed a performance increase of over 200% when compared to Portkey, and over 800% against LiteLLM.
At the same 12 CPU allocation, Kong had 65% lower latency compared to Portkey and 86% lower latency than LiteLLM.
29, reMock sustained 29,005.51 RPS with a P95 of 24.07ms and a P99 of 30.35ms in the baseline run.

👉 Read Kong's AI gateway benchmark comparing Kong, Portkey, and LiteLLM

Context

AI gateways sit between applications, models, tools, and downstream services, so their role is increasingly about governance as much as traffic handling. In an agentic environment, the gateway becomes part of the identity plane because it can enforce policy, shape access, and meter usage across AI consumers and MCP-connected services.

This benchmark compares Kong AI Gateway with Portkey and LiteLLM under a controlled AWS setup. The useful question for practitioners is not which gateway is faster in isolation, but which control plane can support production AI workloads without creating new blind spots in access, observability, and cost governance.

For teams already treating APIs as critical infrastructure, the shift is obvious: AI gateways are becoming a practical enforcement point for workload identity, token-based access, and usage controls. The benchmark is therefore a signal about operating model maturity, not just software performance.

Key questions

Q: How should security teams govern AI gateway traffic in production?

A: Security teams should govern AI gateway traffic by treating the gateway as a runtime control point for identity, quota, observability, and routing. That means enforcing policy before requests reach models or tools, validating that policy stays inline under load, and mapping each AI consumer to an accountable identity and usage boundary.

Q: When does an AI gateway become a governance control rather than just a proxy?

A: An AI gateway becomes a governance control when it consistently enforces authentication, usage limits, and visibility across model, agent, and MCP traffic. If those controls can be bypassed for speed or convenience, the gateway is only a transport layer and does not meaningfully reduce identity or access risk.

Q: What do teams get wrong about performance testing AI gateways?

A: Teams often test only latency and throughput and ignore whether policy remains enforceable at scale. That misses the point. A gateway that is fast but easy to bypass creates governance drift, while a gateway that is slower but authoritative may still be the better security choice if it stays inline.

Q: How can organisations reduce shadow AI when they add gateway controls?

A: Organisations reduce shadow AI by making the approved gateway path easier to use than bypass routes. That requires reliable performance, clear policy, and good developer experience, because users and agents will route around controls that are slow, opaque, or hard to integrate.

Technical breakdown

AI gateway runtime under load

An AI gateway proxies requests between consumers and model backends, but in production it also becomes a policy enforcement and observability layer. Under load, latency, throughput, and resource contention determine whether the gateway can remain inline for real traffic rather than collapsing into an administrative bottleneck. The benchmark isolates this runtime behaviour by using a mocked LLM, fixed CPU ceilings, and identical request patterns, which makes the gateway mechanics visible instead of confounded by model variability.

Practical implication: teams should test whether their AI gateway can remain an always-on control point at real enterprise load, not just in lab conditions.

Policy enforcement for LLM, agent, and MCP traffic

The article frames the gateway as a universal API for securing traffic to LLMs, AI agents, and MCP servers. That matters because the control surface expands from simple prompt forwarding to identity-aware enforcement, where authentication, token quotas, and routing decisions affect both security and cost. In practice, the gateway is where organisations can attach policy to AI consumers before requests reach models or tools, which makes it a candidate control point for AI governance rather than a pure transport layer.

Practical implication: align gateway policy design with identity and access controls so AI consumers are governed before they reach downstream tools.

Why gateway performance is a governance issue

Gateway performance and governance are now linked because slow or fragile inline controls often get bypassed in favour of convenience. If the gateway cannot handle production throughput, teams tend to decentralise model access, which weakens auditability and weakens policy consistency. The benchmark’s point is not that speed alone is the objective, but that sufficiently fast gateways are what make centralised governance operationally viable at scale.

Practical implication: treat performance testing as part of governance design, because weak runtime characteristics can push teams toward shadow AI patterns.

Threat narrative

Attacker objective: The objective is to bypass centralized governance by pushing AI usage into less visible or less controlled paths.

entry: users and agents submit traffic through the gateway layer, which becomes the first enforcement point for model and tool access.
escalation: once traffic volume rises, gaps in policy enforcement, observability, or routing discipline can let uncontrolled AI usage spread across the environment.
impact: the organisation either keeps AI workloads governable at scale or shifts into fragmented, harder-to-audit access patterns that weaken control consistency.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI gateway performance is now an identity governance issue, not just an infrastructure metric. When AI tools become a production access layer for models, agents, and MCP-connected services, the gateway influences who or what can act, how often, and under what policy. That pushes gateway selection into the same decision space as IAM, PAM, and workload governance. Practitioners should evaluate AI gateways as control points, not just proxies.

The real control value of an AI gateway is policy consistency at runtime. The article shows that token quotas, authentication, and observability are part of the same operating model as traffic handling. If those controls are too slow or too fragile, teams are likely to decentralise access and lose enforcement consistency. The implication is that governance design fails when the control plane cannot stay inline under enterprise load.

Identity blast radius is the right named concept for AI gateway design. As AI adoption grows, the question becomes how much access a single gateway path can expose if policy is weak or bypassed. That blast radius includes model usage, downstream tool access, and cost exposure, not just API throughput. Practitioners should think about the gateway as the boundary that defines how far AI identity risk can travel.

APIs are the connective tissue of agentic workflows, so AI governance now spans more than model access. The article correctly points to multi-modal and agentic use cases, where requests move through tools, services, and model calls in sequence. That means identity controls must track the whole path, not only the first hop. Practitioners should re-evaluate whether their current governance model can see and constrain agent-to-service interactions end to end.

Benchmark results only matter if they preserve the governance path that production AI needs. Faster gateways are useful because they make centralised policy feasible, but the field should resist treating performance as the goal. The goal is controlled scale. Practitioners should use performance benchmarks to decide whether a gateway can support enforceable AI governance without pushing teams back to shadow access patterns.

From our research:
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a broader control framework, see OWASP Agentic AI Top 10 for the runtime risks that gateway policy needs to address.

What this signals

AI gateway selection will increasingly determine whether AI governance stays centralised or fragments into exceptions. The practical signal for teams is that runtime performance, policy depth, and visibility now need to be assessed together. If the gateway cannot stay inline, developers and platform teams will drift toward unmanaged routes, which is where governance starts to fail.

With 80% of organisations already reporting AI agents acting beyond intended scope in the SailPoint survey, the benchmark should be read as a control-plane question, not a product race. The issue is whether your gateway can keep model, tool, and consumer access inside a governable boundary. Teams should plan for that boundary to become an audit requirement, not an optional enhancement.

Identity blast radius: the boundary between acceptable AI access and uncontrolled AI consumption will be defined by the gateway path, not by policy documents alone. As agentic workflows expand, practitioners should align gateway enforcement with identity lifecycle, approval boundaries, and downstream service entitlements. That makes the AI gateway part of the identity programme, not a separate AI project.

For practitioners

Test gateway control-plane performance under production-like load Validate that the AI gateway can sustain enterprise traffic while keeping authentication, token enforcement, and observability inline. Use fixed request patterns, constrained compute, and realistic consumer volumes so you can see whether policy stays operational at scale.
Map AI gateway policy to identity controls Define how consumer identity, quota policy, and routing rules interact before AI requests reach models or MCP services. If the gateway cannot express those controls consistently, it will not function as a governance layer.
Measure whether centralised AI access is still enforceable Track the points where teams bypass the gateway because it is too slow, too hard to use, or too limited in policy depth. Those workarounds are early warning signs of shadow AI and fragmented governance.
Review AI traffic paths end to end Identify where requests move from users or agents into tools, services, and model endpoints, then decide which checkpoints need policy enforcement. The gateway should cover the full path, not only the first request hop.

Key takeaways

AI gateways are becoming governance infrastructure because they sit on the access path for models, agents, and tools.
The benchmark shows that performance and control are linked, since weak runtime behaviour encourages teams to bypass central policy.
Practitioners should judge AI gateways by whether they can keep identity, quota, and observability enforceable under real load.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		AI gateways mediate agentic traffic and policy enforcement.
NIST AI RMF		AI governance requires runtime accountability and monitoring.
NIST Zero Trust (SP 800-207)	PR.AC-4	Gateway policy supports least-privilege access to AI services.

Use agentic AI controls to define policy boundaries for model, tool, and MCP access.

Key terms

AI Gateway: An AI gateway is an enforcement and mediation layer between users, agents, applications, and model providers. It can authenticate consumers, apply quotas, route requests, and observe usage, which makes it a governance control point rather than a simple proxy when deployed in production.
Agentic Workflow: An agentic workflow is a sequence of actions where AI systems choose tools, issue requests, and progress tasks with limited human intervention. In identity terms, the workflow creates multiple access decisions that must be governed across model, tool, and service boundaries.
Identity Blast Radius: Identity blast radius is the scope of systems, data, and actions that can be affected when a credential, policy path, or enforcement point is abused. In AI environments, it includes model access, downstream tools, and usage cost, so the control boundary matters as much as the credential itself.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Kong: AI Gateway Benchmark: Kong AI Gateway, Portkey, and LiteLLM. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-07.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org