AI microservices at scale expose the limits of existing security controls

By NHI Mgmt Group Editorial TeamPublished 2026-04-02Domain: Agentic AI & NHIsSource: Kong

TL;DR: AI integrated microservices multiply trust boundaries, identities, policy decisions, and attack paths, while prompt injection remains the top LLM application risk and MCP servers extend internal tool exposure across more services, according to Kong. The practical lesson is that zero trust, centralized policy, and workload identity governance must extend to AI traffic rather than treating it as a separate class of control.

At a glance

What this is: This is Kong’s analysis of five security practices for AI microservices at scale, centered on how AI widens trust boundaries, exposes MCP and RAG paths, and makes existing API controls incomplete.

Why it matters: It matters because IAM, IGA, PAM, NHI, and platform teams now have to govern AI traffic, service identities, and tool access as one control plane rather than as separate domains.

By the numbers:

The microservices architecture market has grown from $6.27 billion in 2024 to $7.4 billion in 2025 at a compound annual growth rate of 17.9%, with projections reaching $15.64 billion by 2029.
OWASP's 2025 Top 10 for LLM Applications ranks prompt injection as the number one critical vulnerability.
2025 Cost of a Data Breach Report, Data Breach Report, the global average cost of a data breach fell 9% to $4.44 million, while US-specific costs rose 9% to a record $10.22 million.

👉 Read Kong's full analysis of securing AI microservices at scale

Context

AI microservices combine the familiar complexity of distributed systems with a new layer of runtime decision-making, which is why traditional API security starts to miss important trust boundaries. In this model, one user prompt can trigger gateway checks, LLM calls, retrieval from vector stores, and tool execution through MCP servers, so identity, policy, and observability all have to travel with the request.

For identity teams, the core issue is not simply more traffic. It is that AI workloads behave like non-human actors that need scoped credentials, continuous verification, and shared governance across APIs, service accounts, tokens, and tool endpoints. Kong's framing is useful because it treats AI as an extension of microservices security, not as a separate security island.

The article also reinforces a broader practitioner pattern: when AI is added to service meshes, gateways, and RAG pipelines without central policy, the organisation inherits more blind spots than it removes. That is typical of early AI adoption, but it is no longer sustainable at scale.

Key questions

Q: How should security teams govern AI microservices that mix APIs, models, and tool access?

A: Security teams should govern AI microservices as one identity and policy problem, not as separate API, ML, and platform issues. That means binding each service and AI workload to unique credentials, enforcing consistent authorisation at every hop, and centralising audit and rate limits so the full request path remains visible.

Q: Why do AI microservices increase the risk of lateral movement and data exposure?

A: AI microservices increase risk because one request can traverse many identities, retrieval sources, and tools before returning a response. If any step is over-permissioned, the AI path can become a shortcut for data access or unintended action, which expands the blast radius of a single compromised workflow.

Q: What do security teams get wrong about prompt injection in production AI systems?

A: They often treat prompt injection as a content problem instead of an access problem. In production, the issue is that malicious instructions can flow through trusted retrieval content or user input and influence tool use, so the control gap sits inside the decision path, not just at the perimeter.

Q: How do organisations know whether their AI and API controls are actually working?

A: They know controls are working when they can trace every AI request from prompt to retrieval to model output to downstream action, with consistent identity, policy, and logging across each step. If a security team cannot reconstruct the request path, governance is incomplete.

Technical breakdown

Zero trust for AI microservices and service-to-service identity

Zero trust in AI microservices means every request must authenticate, authorise, and be continuously re-evaluated as it moves between gateway, model, retrieval, and tool layers. Kong's argument is that AI agents act like high-privilege machine callers, so IP-based trust and static network assumptions do not survive containerised, distributed execution. Mutual TLS, short-lived credentials, and per-service identity are the foundation, but the real control is consistency across every hop.

Practical implication: bind each AI workload to unique cryptographic identity and remove any reliance on network location as proof of trust.

Centralised policy enforcement for API traffic and AI traffic

A unified control plane matters because AI traffic is still API traffic, just with more dynamic behaviour and more opportunities for policy drift. When platform, data, and ML teams each enforce their own rules, attackers look for the weakest endpoint and pivot through it. Kong's model groups authentication, rate limits, token budgets, and audit logging in one place so that human users, services, and AI agents inherit the same enforcement logic.

Practical implication: align API gateway policy, LLM controls, and service-to-service authorisation into one enforcement layer rather than three separate tools.

Securing RAG pipelines and MCP servers as identity boundaries

RAG and MCP expand the attack surface because they turn internal data stores and tools into runtime dependencies for AI systems. RAG retrieval paths can expose sensitive collections if document access is not enforced at the retrieval layer, while MCP servers can let agents chain tools and act without human oversight. The technical risk is not just exposure, but unauthorised delegation of capability to a caller that can discover and combine tools dynamically.

Practical implication: treat every RAG endpoint and MCP tool as a governed API with authentication, authorisation, rate limiting, and logging.

Threat narrative

Attacker objective: The attacker aims to turn a single AI interaction into broad internal access, data exposure, or unintended action execution across connected services.

Entry occurs when a user prompt, crafted input, or malicious source material reaches an AI workflow that spans gateway, model, retrieval, and tool layers.
Escalation happens when the model or agent is able to invoke multiple internal services, search vector stores, or chain MCP tools beyond the original intent of the request.
Impact follows when the AI-driven path accesses sensitive documents, triggers unintended actions, or leaks credentials and data through poorly governed retrieval and tool execution.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI microservices security is really identity governance stretched across more runtime decisions. The article makes clear that the control problem is no longer just traffic filtering. It is about who or what can call which service, under what conditions, and with what scope when the request is being mediated by models, retrieval layers, and tools. For practitioners, that means AI security inherits IAM, PAM, and NHI governance whether the programme is ready or not.

Prompt injection is the clearest sign that semantic attacks now sit inside the access boundary. Traditional API controls can validate syntax, tokens, and network paths, but they do not understand whether an AI response has been steered into exfiltration or unsafe tool use. That is why the article correctly places prompt injection at the centre of the AI microservices risk model. Practitioners should treat model input as a control surface, not just a data payload.

Identity blast radius is the right named concept for AI microservices at scale. Each added service, tool, and retrieval path increases the number of identities that can be abused, chained, or over-scoped during a single request. The problem is not only credential count, but the compounding effect of delegated access across layers. For security teams, blast radius becomes the sharper design constraint than simple perimeter protection.

Zero trust for AI only works when policy follows the request path end to end. Kong's guidance points to a broader market truth: fragmented controls across gateways, mesh layers, and LLM endpoints create enforcement gaps that attackers can exploit. The practitioner implication is that governance must be model-aware, service-aware, and tool-aware at the same time.

RAG and MCP governance are now part of core identity architecture, not adjacent architecture. Once AI systems can retrieve documents and invoke internal tools, they participate in the same privilege chain as service accounts and other NHIs. That collapses the old separation between application security and identity governance. The programme implication is that access reviews, policy enforcement, and auditability must extend to AI execution paths.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That visibility gap is why OWASP NHI Top 10 matters for teams extending identity governance into agentic tool use.

What this signals

Identity blast radius: AI microservices are making the old separation between application security and identity governance less useful every quarter. Once prompts can trigger retrieval, tool use, and downstream service calls, the practical question becomes how far a single request can travel under one identity before controls fail. Teams that still govern these layers separately will keep finding policy gaps after the fact.

With 80% of organisations already reporting agent behaviour beyond intended scope in our research, the trajectory is clear: more AI features will mean more identity paths to govern, not fewer. The programme implication is that service accounts, token budgets, and tool permissions need to be reviewed together, especially where RAG and MCP sit in the same execution chain.

Practitioners should also look at the control model, not just the workload count. If your environment cannot produce a single trace across prompt, retrieval, model response, and action, then incident response and audit readiness will lag the deployment curve. That is where AI governance becomes an operational discipline rather than a policy statement.

For practitioners

Inventory AI-exposed identities and tool paths Map every LLM endpoint, RAG collection, MCP server, and service account involved in AI request flows, then document which identities can access each step. Use that inventory to identify where authority is inherited rather than explicitly granted.
Enforce short-lived credentials for AI workloads Issue unique, time-bound credentials for AI services and rotate them as aggressively as other high-risk machine identities. Remove any long-lived shared secrets from model, retrieval, or tool integrations.
Centralise policy for API and AI traffic together Apply the same authentication, rate limiting, audit logging, and ABAC rules to human APIs, internal service calls, and AI tool requests. Keep one control plane so enforcement cannot drift between teams.
Treat RAG and MCP as governed access boundaries Require authorisation at the retrieval layer for documents and at the tool layer for every MCP action. Deny unauthenticated retrieval paths and review which tools an agent can chain in a single session.
Measure AI request traceability end to end Correlate prompt, retrieval, model output, and downstream action logs using a consistent identifier so you can reconstruct how a request moved through the stack. If you cannot trace the path, you cannot investigate abuse.

Key takeaways

AI microservices turn identity governance into a runtime problem because one request can traverse many services, models, and tools before it completes.
Prompt injection, retrieval abuse, and MCP tool chaining are governance failures as much as technical attacks, because they operate inside the AI decision path.
The right control model is end-to-end policy, short-lived credentials, and full request traceability across APIs, RAG, and AI agents.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection and tool misuse are central to the article's risk model.
OWASP Non-Human Identity Top 10	NHI-03	Short-lived credentials and rotation are core to securing AI workloads and tools.
NIST Zero Trust (SP 800-207)	PR.AC-4	The post advocates per-request verification and continuous trust checks across services.

Inventory AI service identities and enforce rotation, scope limits, and revocation for every secret.

Key terms

AI Microservice: A microservice that participates in an AI workflow by calling models, retrieving context, or invoking tools. It behaves like a normal service from an architecture perspective, but its identity risk is higher because one request can cascade through multiple privileged systems in a single session.
RAG Pipeline: A retrieval-augmented generation pipeline combines search over internal data with model output generation. It expands the attack surface because the model can only be trusted if retrieval is authorised, filtered, and traceable all the way back to the source document.
MCP Server: A Model Context Protocol server exposes tools and capabilities to an AI agent at runtime. In governance terms, it is a privileged interface that must be treated like any other API endpoint, with authentication, authorisation, logging, and strict control over tool chaining.
Identity Blast Radius: The amount of damage one identity can cause if it is misused, over-scoped, or compromised. In AI microservices, blast radius grows when prompts, retrieval, and tool calls share the same request path without clear privilege boundaries or traceability.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Kong: 5 Best Practices for Securing AI Microservices at Scale. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-02.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org