TL;DR: APIs are now the connective tissue for GenAI and agentic workflows, and Kong’s benchmark reports that its AI Gateway outperformed Portkey and LiteLLM on throughput and latency in a controlled EKS test, according to Kong. Performance matters, but identity governance and policy enforcement remain the deciding factors as AI usage moves into production.
NHIMG editorial — based on content published by Kong: AI Gateway Benchmark: Kong AI Gateway, Portkey, and LiteLLM
By the numbers:
- Kong Konnect Data Planes showed a performance increase of over 200% when compared to Portkey, and over 800% against LiteLLM.
- At the same 12 CPU allocation, Kong had 65% lower latency compared to Portkey and 86% lower latency than LiteLLM.
- WireMock sustained 29,005.51 RPS with a P95 of 24.07ms and a P99 of 30.35ms in the baseline run.
Questions worth separating out
Q: How should security teams govern AI gateway traffic in production?
A: Security teams should govern AI gateway traffic by treating the gateway as a runtime control point for identity, quota, observability, and routing.
Q: When does an AI gateway become a governance control rather than just a proxy?
A: An AI gateway becomes a governance control when it consistently enforces authentication, usage limits, and visibility across model, agent, and MCP traffic.
Q: What do teams get wrong about performance testing AI gateways?
A: Teams often test only latency and throughput and ignore whether policy remains enforceable at scale.
Practitioner guidance
- Test gateway control-plane performance under production-like load Validate that the AI gateway can sustain enterprise traffic while keeping authentication, token enforcement, and observability inline.
- Map AI gateway policy to identity controls Define how consumer identity, quota policy, and routing rules interact before AI requests reach models or MCP services.
- Measure whether centralised AI access is still enforceable Track the points where teams bypass the gateway because it is too slow, too hard to use, or too limited in policy depth.
What's in the full article
Kong's full blog covers the operational detail this post intentionally leaves for the source:
- Exact benchmark architecture, including the AWS EKS layout and load-generation setup used for the tests
- Per-gateway deployment details for Kong, Portkey, and LiteLLM, including the resource settings used in each run
- Latency, throughput, and CPU charts that show how each gateway behaved under the same 12 CPU ceiling
- The article's own interpretation of why the Kong runtime stayed stable when the other gateways did not
👉 Read Kong's AI gateway benchmark comparing Kong, Portkey, and LiteLLM →
AI gateway benchmarks and the governance gap for agentic workloads?
Explore further