AI gateway governance for GenAI applications needs stronger control

By NHI Mgmt Group Editorial TeamPublished 2025-10-06Domain: Best PracticesSource: Kong

TL;DR: GenAI applications can move from operational chaos to controlled routing when the API layer centralises governance, secret handling, rate limits, observability, and prompt-injection defences, according to Kong. The real lesson is that GenAI governance fails when teams treat the model layer as the only risk boundary; access control and telemetry around the application flow matter just as much.

At a glance

What this is: This is an analysis of how an AI gateway changes GenAI application governance by adding control, visibility, and security at the API layer.

Why it matters: It matters because IAM, PAM, and NHI teams increasingly need to govern model access, secret use, and request flow as part of the same control plane.

👉 Read Kong's analysis of AI gateway governance for GenAI applications

Context

AI gateway governance is the practice of controlling how GenAI applications reach models, tools, and data. In this case, the central problem is that the application may be functionally correct while still exposing secrets, lacking auditability, and allowing prompt-level abuse through the request path.

For IAM and NHI programmes, the key issue is not just model access. It is the control of API keys, routing, quotas, observability, and policy enforcement around the workload that invokes the model. That is where governance either exists or fails.

Kong’s post is a practical example of a broader pattern: teams are pushing GenAI into production before they have a stable control layer around it. That is typical, not unusual, across early AI application rollouts.

Key questions

Q: How should security teams govern GenAI applications that rely on external model APIs?

A: They should put policy, logging, rate limiting, and secret handling in front of the model rather than inside individual applications. A gateway-style control layer helps standardise enforcement, reduce secret sprawl, and create audit trails for every request. That approach is more reliable than letting each team build its own ad hoc controls.

Q: Why do exposed API keys create such a large risk in GenAI workloads?

A: Because the key is the workload’s identity, so compromise can grant direct access to model APIs and related data flows. In GenAI systems, that can lead to cost abuse, data exposure, or unauthorised prompt manipulation. The risk increases when the same secret is reused across environments or embedded in application code.

Q: What do security teams get wrong about prompt injection in production AI apps?

A: They often treat prompt injection as a content moderation issue when it is really a request-control issue. If untrusted input can change model behaviour, tool selection, or retrieval results, the application has an execution integrity problem. Defences need to include input controls, policy gates, and telemetry, not just prompt wording filters.

Q: Who should be accountable for AI gateway governance in an enterprise?

A: Accountability should sit with the teams that own identity, platform policy, and operational risk together, not with model developers alone. When a gateway controls secrets, routing, and usage, it becomes part of the governance stack. That means IAM, security architecture, and platform engineering need shared oversight.

Technical breakdown

AI gateway control planes and model access routing

An AI gateway sits between the application and external model providers, acting as a policy enforcement layer for requests, responses, and usage. In this architecture, the gateway can apply routing rules, rate limits, quotas, caching, and observability before traffic reaches the model endpoint. That matters because GenAI workloads are not just API calls. They are composite flows that may include prompts, retrieval, tools, and response shaping. Without a gateway, each app team tends to implement its own controls inconsistently, which fragments governance and makes auditability weak.

Practical implication: place model access behind a governed control plane so policy, logging, and request shaping are enforced centrally.

Prompt injection, secrets exposure, and API key risk

Prompt injection is a request manipulation problem, not just a content problem. If an attacker can influence the prompt flow, they may redirect behaviour, exfiltrate instructions, or induce unsafe tool use. The article also points to exposed API keys, which turns the application into an NHI governance issue because the keys are the identity the workload uses to authenticate. Secrets stored in code or loosely managed backends expand blast radius when compromise occurs. The technical risk is the combination of user input, model trust, and credential reuse in one flow.

Practical implication: treat prompts and API keys as separate control domains, with secret isolation and input controls both enforced.

Observability for prompt flows, token usage, and latency

Observability is the difference between operating an AI workload and guessing at its behaviour. In GenAI systems, telemetry must cover prompt flow, token consumption, response latency, and policy hits such as rate limiting or blocked injections. Those signals reveal cost anomalies, misuse patterns, and unexpected application behaviour that standard API logs often miss. This is especially important when the same gateway also performs optimisation functions like caching or semantic routing, because the control layer itself becomes part of the operational risk surface.

Practical implication: instrument prompt and token telemetry as first-class governance data, not as optional application diagnostics.

Threat narrative

Attacker objective: The attacker seeks to manipulate or abuse the GenAI application’s trusted execution path so they can extract value, increase cost, or redirect behaviour.

Entry begins when a GenAI application exposes API keys and accepts untrusted prompt traffic through a public-facing request path.
Escalation occurs when prompt injection or credential misuse lets the attacker influence model behaviour, routing decisions, or downstream tool usage.
Impact is realised through cost amplification, data exposure, degraded response integrity, or operational disruption across the AI workflow.

Shai Hulud npm malware campaign — Shai Hulud campaign: npm malware exposed secrets on GitHub.
Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI gateway governance is becoming the control point for GenAI applications, not a convenience layer. Once teams move from demos to production, routing, quotas, telemetry, and secret handling become governance functions rather than engineering extras. The article reflects that shift clearly: the application remained usable only after the control plane was tightened around it. Practitioners should treat the AI gateway as part of identity and access governance for the workload.

Prompt injection and exposed API keys belong in the same governance conversation. Prompt abuse is not just a content safety issue, and leaked keys are not just a secrets-management issue. Together they show that GenAI applications create a coupled risk surface where untrusted input and workload identity collide. The implication is that teams cannot separate application security from NHI control when the same request path carries both prompts and credentials.

Prompt trust assumptions were designed for stable request flows, not for adaptive AI workloads. That assumption fails when the application can reshape prompts, route dynamically, and alter token consumption in real time. The implication is that governance models built around predictable API behaviour no longer describe the actual system.

Identity blast radius is now defined by the gateway layer as much as by the model provider. If the gateway can centralise secrets, rate limits, logging, and policy, then it also centralises failure and enforcement. That makes the control plane a high-value governance boundary for NHI and AI workloads alike. Practitioners should evaluate whether their current architecture concentrates too much trust in a single AI ingress point.

Observability is no longer optional once GenAI enters production workflows. The article’s emphasis on prompt flows, token usage, and latency reflects a broader market reality: teams need evidence of behaviour, not just uptime. Without that telemetry, cost overruns and policy violations remain invisible until they become operational incidents. Practitioners should make request-level auditability a baseline requirement.

From our research:
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For the wider control-plane context, see OWASP Agentic AI Top 10 for the current risk model around tool use, routing, and agent behaviour.

What this signals

AI gateway governance will increasingly sit between application security and identity governance. Teams that still treat GenAI as a front-end feature will miss the fact that model access, secret handling, and telemetry are becoming policy enforcement problems. The practical shift is toward one control layer that can express request-level rules, audit behaviour, and support incident response across both application and identity stacks.

Prompt-level abuse creates an identity problem as much as an application problem. When a GenAI workload authenticates with long-lived credentials, the blast radius extends beyond the model call itself. That is why NHI governance, secret isolation, and usage telemetry need to be designed together rather than managed in separate silos.

Enterprise teams should expect more scrutiny of the AI ingress layer as agentic systems spread. The next governance gap will not be whether a model can answer a request, but whether the organisation can prove who or what authorised that request, what data it touched, and whether the control layer stopped misuse. For that reason, teams should align gateway policy with identity lifecycle and workload access review now.

For practitioners

Separate model access from application secrets Move API keys and other secrets into a dedicated vault and keep them out of application code, build artefacts, and client-visible configuration. Review which services can authenticate to the model provider and remove any shared keys that widen blast radius.
Enforce policy at the AI ingress layer Apply rate limiting, quotas, request filtering, and prompt inspection before traffic reaches the model. Use the gateway as the enforcement point so application teams do not implement inconsistent controls in each service.
Instrument prompt and token telemetry Capture prompt flow, token counts, latency, and policy decisions in the same operational view so security and platform teams can detect abnormal usage patterns. Treat those signals as governance evidence, not just performance data.
Review GenAI architectures for hidden trust chains Map every place where the application trusts user input, cached context, embedded retrieval data, or delegated tool access. Remove implicit trust where a single request can influence multiple downstream systems.

Key takeaways

GenAI governance fails when organisations treat the model as the only control boundary and ignore the request path, secrets, and telemetry around it.
The article shows that production AI problems are usually control-plane problems first, especially when exposed keys, prompt injection, and limited visibility all appear together.
Security teams should move policy enforcement, secret isolation, and request observability into the AI gateway layer so governance can keep pace with production use.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Prompt injection and tool-routing risk map directly to agentic application controls.
OWASP Non-Human Identity Top 10	NHI-01	Exposed API keys and vault handling are classic non-human identity concerns.
NIST CSF 2.0	PR.AA-01	Identity and access management is central to governing model-facing service credentials.

Review gateway policies against agentic request-path abuse and block unsafe tool or prompt flows.

Key terms

AI gateway: A control layer that sits between an application and one or more model providers to enforce policy, routing, logging, and usage limits. In practice, it becomes the place where GenAI traffic is authorised, measured, and constrained before the request reaches the model.
Prompt injection: A manipulation technique where attacker-controlled text changes the behaviour of an AI system by influencing its prompt or context. The issue is not just harmful output. It can also drive unintended tool use, data exposure, or policy bypass when the application trusts unverified input.
Workload identity: The identity used by a non-human system to authenticate to other services, such as an API key, token, or certificate. For GenAI workloads, this identity determines who can call the model provider and therefore how far compromise can spread across the environment.
Request-level observability: Telemetry that captures how each request moves through a control layer, including prompt flow, token usage, latency, and policy decisions. This gives security and platform teams evidence of behaviour rather than relying on coarse application logs that hide AI-specific risk.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building or maturing an IAM programme, it is worth exploring.

This post draws on content published by Kong: From Chaos to Control: How Kong AI Gateway Streamlined My GenAI Application. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-10-06.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org