DIY MCP server infrastructure hides growing identity and governance costs

By NHI Mgmt Group Editorial TeamPublished 2026-04-13Domain: Agentic AI & NHIsSource: Kong

TL;DR: DIY MCP servers create hidden production costs in authentication, governance, observability, and maintenance as adoption scales, according to Kong, with over 16,000 MCP servers now in the wild and shadow AI already linked to breaches and higher incident costs. The core issue is that MCP makes agent-to-tool access easier, but it also expands identity, audit, and lifecycle burden faster than most teams expect.

At a glance

What this is: This is Kong’s analysis of why DIY MCP server infrastructure becomes expensive in production, with identity, governance, observability, and maintenance emerging as the main hidden costs.

Why it matters: It matters because MCP sits directly in the access path between AI agents and internal tools, so IAM, PAM, NHI, and lifecycle teams need to treat it as governed infrastructure rather than a prototype shortcut.

By the numbers:

One in five organizations reported a breach due to shadow AI, and only 37% have policies to manage AI or detect shadow AI.
For Fortune 500 companies, downtime costs average $500,000 to $1 million per hour.

👉 Read Kong's analysis of the hidden costs of DIY MCP server infrastructure

Context

MCP, or Model Context Protocol, standardises how AI agents discover and invoke tools, but that convenience also pulls identity and governance concerns into the control path. In practice, the difficult part is not making an agent call a tool once. It is sustaining authentication, auditability, and ownership when dozens of tools, environments, and token types are involved.

That is why the DIY MCP debate is really an identity governance debate. Once an MCP server moves from demo to production, it becomes part of the enterprise access layer for non-human identities and increasingly for agentic workflows, which means lifecycle, audit, and privilege management all need a real control plane.

Key questions

Q: What breaks when MCP servers are built without central governance?

A: Without central governance, MCP servers multiply into an unmanaged access layer. Teams lose visibility into which tools exist, who owns them, and whether they should still be active. That creates shadow AI exposure, inconsistent auth, and stale endpoints that outlive the business need that created them.

Q: Why do MCP environments create more identity risk than standard API integrations?

A: MCP environments increase identity risk because they add tool discovery, delegated access, and multiple authentication paths on top of existing APIs. Every new tool expands the number of trust relationships and audit points. That makes lifecycle management, policy consistency, and observability harder to maintain than in a simpler API integration model.

Q: How do teams know if their MCP governance is actually working?

A: Governance is working when every MCP tool has a named owner, a clear approval record, a retirement path, and traceable authentication logs. If tools exist outside the registry, if logs cannot identify who accessed what, or if stale endpoints remain live, governance is failing even if the server appears functional.

Q: What is the difference between a prototype MCP server and production MCP infrastructure?

A: A prototype proves that an agent can call a tool. Production infrastructure proves that the call is secure, auditable, resilient, and owned over time. The difference is not just scale. It is whether the access path can survive token expiry, team turnover, policy changes, and incident response without becoming unmanageable.

Technical breakdown

Why MCP authentication becomes exponential in production

A prototype can use a single token or API key, but production MCP usually has to broker multiple auth methods across tools, environments, tenants, and client types. That creates a matrix of OAuth flows, JWT validation, API keys, and downstream delegation that is easy to underestimate. The problem is not authentication as a concept. The problem is that every new tool multiplies the number of trust relationships, token refresh paths, and failure modes that must be governed consistently.

Practical implication: treat MCP authentication as a governed identity surface, not as per-service implementation detail.

How MCP governance breaks without a registry and ownership model

As teams create separate MCP servers for the same APIs, the environment shifts from convenience to sprawl. Without central registration, versioning, and ownership, no one can reliably say which tools exist, who approved them, or which ones should be retired. That is an identity lifecycle problem as much as a platform problem, because access persists long after business ownership changes. The result is shadow AI exposure, stale endpoints, and inconsistent policy enforcement across teams.

Practical implication: tie every MCP tool to ownership, approval, and decommissioning workflows before it enters production.

Why observability is the difference between a tool call and an incident

MCP failures can sit in the agent prompt, the server, the downstream API, or the token itself. Without per-tool logs, latency metrics, authentication telemetry, and correlation IDs, teams cannot tell whether the issue is access, routing, or payload handling. In an agent-driven environment, that blind spot matters because failed tool calls are not just application errors. They can be evidence of token expiry, privilege mismatch, or policy drift that would otherwise remain invisible until a broader incident occurs.

Practical implication: instrument MCP with request, auth, and audit telemetry before usage scales beyond a small pilot.

Threat narrative

Attacker objective: The attacker or unauthorized actor gains durable, poorly monitored access to internal tools and data through the MCP path, often by exploiting stale credentials or unmanaged servers.

Entry begins when teams expose MCP servers to internal tools without a central trust boundary, which creates a broad access surface for AI agents and their tokens.
Escalation occurs when multiple servers, auth methods, and stale credentials accumulate, letting unmanaged access and shadow AI paths persist beyond their intended scope.
Impact follows when poor ownership and weak observability turn routine tool calls into undetected access drift, compliance gaps, and expensive incident response.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Salesloft OAuth token breach — hackers stole OAuth tokens to access Salesforce data via Salesloft.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

MCP creates an identity control plane problem, not just an integration problem. Once AI agents can discover and invoke tools through a common protocol, the enterprise is no longer managing isolated scripts. It is governing a distributed access layer with multiple auth methods, ownership domains, and audit requirements. That means MCP belongs in the same governance conversation as API security, NHI lifecycle, and privileged access, not in a developer convenience bucket. Practitioners should treat the protocol as infrastructure that expands identity scope, not as glue code that sits outside policy.

The hidden cost of DIY MCP is lifecycle debt. Every server that stays live after a team changes, a tool deprecates, or a system is retired becomes an access liability. The article’s real insight is that build decisions accumulate into identity debt because no one owns the offboarding path. That pattern mirrors the failure mode behind many NHI issues: access outlives the business context that justified it. Practitioners should see MCP sprawl as a lifecycle governance failure, not a tooling inconvenience.

Shadow AI becomes materially worse when MCP makes tool discovery trivial. The moment agents can locate and use tools with limited friction, policy drift moves from theoretical to operational. Our view is that the category is converging on a named failure mode: MCP governance gap: the absence of a central control plane for discovery, approval, ownership, and audit. That gap is what allows approved infrastructure and unapproved use to coexist in the same environment. Practitioners should assume tool discoverability without governance will outpace policy by default.

Security teams should stop separating agent governance from API governance. The article shows that authentication, observability, and scaling pressures are the same controls teams already struggle to operationalise across APIs. The difference is that AI agents can amplify those weaknesses by chaining tool calls faster and across more systems. In practice, this pulls IAM, PAM, NHI, and platform engineering into a single operating model. Practitioners should align the control plane before the tool estate fragments beyond recovery.

Enterprise readiness for MCP will be judged by auditability, not prototype success. A demo that routes one agent to one tool proves nothing about scale, recovery, or accountability. The production question is whether each invocation can be traced to an approved identity, a known owner, and a valid policy state. That is the standard governance teams will be held to as agentic access expands. Practitioners should measure MCP maturity by audit completeness and ownership clarity, not by how quickly a prototype ships.

From our research:
One in five organizations reported a breach due to shadow AI, and only 37% have policies to manage AI or detect shadow AI, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a broader control-plane view, compare that with OWASP Agentic Applications Top 10 to frame tool discovery, privilege, and delegation risks together.

What this signals

With 98% of companies planning to deploy even more AI agents within the next 12 months, the MCP problem is moving from niche implementation detail to programme-level governance pressure. Teams that treat server sprawl as an engineering issue will miss the identity lifecycle debt that accumulates underneath it.

MCP governance gap: once tool discovery becomes easy, approval, ownership, and audit controls become the differentiator between controlled expansion and unmanaged access sprawl. Practitioners should align registry, lifecycle, and audit processes now, before the estate fragments across teams and environments.

The practical signal is that agent access will increasingly look like privileged infrastructure rather than experimental automation. That makes a resource such as the NHI Lifecycle Management Guide relevant for offboarding, ownership, and rotation thinking, while the OWASP Top 10 for Agentic Applications 2026 remains a useful external lens on agent misuse and tool abuse.

For practitioners

Map every MCP server to an owner and decommissioning path Require a named business owner, technical owner, and retirement trigger for each tool. No server should move into production without a documented offboarding workflow and a review date tied to the system it depends on.
Centralise authentication patterns across tools and environments Standardise OAuth, JWT, and API key handling through a common control plane so each MCP server does not invent its own trust model. Separate client authentication from downstream delegation and log both.
Instrument per-tool audit and failure telemetry Capture request context, token outcome, error class, latency, and tool identity for every invocation. Route those logs into the same monitoring and SIEM workflows used for privileged access and API governance.
Treat shadow AI discovery as a control objective Inventory all agent-facing tool endpoints, then compare them against approved registries and ownership records. Flag any MCP server that exists outside policy, even if it appears to be functioning normally.
Test scale, retry, and expiry behaviour before production cutover Load test concurrent agent access, token expiration, and retry storms in staging. Validate how the system behaves when an identity token expires mid-session or a downstream tool returns inconsistent errors.

Key takeaways

DIY MCP servers create identity, governance, and maintenance debt that grows faster than most teams expect.
Shadow AI and poor observability turn tool access into a compliance and incident-response problem, not just a development choice.
The control plane for MCP needs ownership, auditability, and lifecycle discipline before agent use scales further.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	MCP tool discovery and delegated actions create agent misuse risk.
OWASP Non-Human Identity Top 10	NHI-03	MCP tokens, API keys, and service access need lifecycle governance.
NIST CSF 2.0	PR.AA-1	Authentication and auditability are central to MCP production readiness.

Map MCP identities to access governance and verify logs support incident response and compliance.

Key terms

Mcp Server: An MCP server is the component that exposes tools and data sources to AI agents through Model Context Protocol. In production, it becomes part of the enterprise access layer, so its authentication, auditing, and ownership model matter as much as its API behaviour.
Shadow Ai: Shadow AI is AI use that appears outside approved governance, inventory, or monitoring. In MCP environments, it often shows up as unregistered tools, unreviewed agent access, or unmanaged servers that can reach internal systems without clear ownership or audit trails.
Identity Control Plane: An identity control plane is the governance layer that centralises authentication, authorisation, lifecycle, and audit decisions across tools and workloads. For MCP, it determines whether agent-to-tool access is managed consistently or scattered across ad hoc implementations.
Lifecycle Debt: Lifecycle debt is the accumulation of access and ownership problems when tools, credentials, or integrations outlive the business context that created them. In MCP programmes, it appears when servers remain active after teams change, systems retire, or approvals are forgotten.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Kong: Build vs Buy: The Hidden Costs of DIY MCP Server Infrastructure. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-13.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org