MCP security shows why AI tools need content controls

By NHI Mgmt Group Editorial TeamPublished 2025-08-28Domain: Agentic AI & NHIsSource: Lakera

TL;DR: MCP connects AI applications to tools and data, but Lakera’s guide shows that prompt injection, data leaks, and unsafe outputs can still pass through unless inputs and outputs are screened at the tool boundary. The governance gap is not access alone, but what the model is allowed to do with access once context becomes executable.

At a glance

What this is: This is an engineering guide on securing Model Context Protocol servers, with the core finding that input and output screening is needed because MCP exposes tool and resource paths to prompt injection and data leakage.

Why it matters: It matters to IAM, NHI, and AI governance teams because MCP shifts control from static access assignment to runtime tool use, which changes how privilege, validation, and data handling have to be governed.

👉 Read Lakera’s engineering guide on securing MCP servers with Guard

Context

Model Context Protocol, or MCP, is a bridge between AI applications and external tools, prompts, and resources. That bridge is useful, but it also turns ordinary model inputs into a security boundary that can be attacked through injection, hidden instructions, or unsafe context handling.

For identity and access teams, the important change is that MCP does not just expose a connection pattern. It creates a runtime authorization problem for non-human identities and AI systems, where the question is not only who can connect, but what the model can do with the data and tools it reaches once connected.

Key questions

Q: How should security teams govern MCP servers used by AI applications?

A: Security teams should govern MCP servers as delegated non-human identity pathways, not just as APIs. That means scoping which tools and resources are reachable, validating content at runtime, and assigning clear ownership for each integration. If the model can act on untrusted context, the control model must cover both identity permissions and model-facing content handling.

Q: Why do MCP-based AI systems increase prompt injection risk?

A: MCP-based AI systems increase prompt injection risk because they connect models directly to tools and external context that may contain hidden instructions. Once the model trusts that content, attackers can redirect behavior without breaking authentication. The risk rises when organizations assume access control alone is enough to protect model actions.

Q: What breaks when AI tool permissions are too broad in MCP environments?

A: When AI tool permissions are too broad, a model can turn a small content manipulation into a large operational action. That expands the blast radius from a single request to connected systems, data sources, or downstream workflows. Broad permissions also make it harder to distinguish legitimate use from abuse after the fact.

Q: How can teams reduce the impact of unsafe model output in MCP workflows?

A: Teams can reduce impact by combining output screening, narrow tool scoping, and reviewable ownership for each MCP integration. The goal is to stop unsafe content before it becomes a tool action or user-facing result. If the model has already crossed the content boundary, the next control should limit what it can still change.

Technical breakdown

MCP tools, prompts, and resources create three distinct trust boundaries

MCP server security is not a single control problem. Tools are executable functions, prompts are model instructions, and resources are structured context, so each surface can be abused differently. A tool call can trigger unsafe action, a prompt can carry hidden instruction content, and a resource can inject untrusted data into the model’s reasoning path. The article’s key point is that all three primitives need validation at the point where content enters or leaves the model workflow, because the model treats context as operational input rather than passive text.

Practical implication: classify tools, prompts, and resources separately and apply content validation to each boundary, not just to the API front door.

Prompt injection turns context into an execution path

The example in the article shows how a seemingly harmless summarization request can be manipulated by hidden content inside the input. That is the core MCP risk: the model is asked to interpret text, but the text contains instructions that redirect behavior. In practice, prompt injection is dangerous because it does not need to break authentication. It exploits trust in the content itself. For AI systems connected through MCP, the security problem is that model behavior can be steered by untrusted context after legitimate access has already been granted.

Practical implication: treat every retrieved or user-supplied text stream as potentially adversarial before it reaches an MCP-enabled model.

Real-time screening is a runtime control, not a governance substitute

Lakera’s approach uses a decorator that screens input and output before the tool returns data to the model or the user. That is a runtime control pattern, not a governance framework by itself. It reduces exposure from harmful content, but it does not remove the need for scope control, prompt design review, or permission boundaries around the underlying MCP server. The architectural lesson is that model-facing validation must sit alongside least privilege and resource scoping, because content safety and access control solve different parts of the same problem.

Practical implication: pair runtime content screening with explicit tool scoping and resource validation so that one layer does not carry the full security burden.

Threat narrative

Attacker objective: The attacker’s objective is to steer the AI system into revealing information, taking unsafe actions, or exfiltrating data through trusted MCP pathways.

Entry occurs when an attacker places hidden instructions or harmful content into text that an MCP-enabled model is later asked to process.
Credential or privilege abuse follows when the model is allowed to call tools or access resources based on that untrusted context instead of validated intent.
Impact occurs when the model returns manipulated output, leaks data, or performs an unsafe action on behalf of the user.

ASP.NET machine keys RCE attack — 3,000+ exposed ASP.NET machine keys enabled remote code execution.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Control what the model does, not only what it can reach: MCP makes access control incomplete if governance stops at connection approval. The article shows that a model can be allowed to reach a tool or resource and still be unsafe because the content it processes can alter behavior at runtime. That means the effective security boundary has moved from identity issuance to content interpretation. Practitioners should treat model-facing execution paths as governed surfaces, not just authenticated integrations.

Prompt injection is a governance problem, not just an application bug: The failure here is not simply a vulnerable parser or a bad prompt template. The deeper issue is that MCP assumes context can be trusted once delivered to the model, and that assumption is too weak for modern AI workflows. When hidden instructions can redirect summarization, retrieval, or tool execution, the control gap is at the policy layer as much as the code layer. Teams need to recognize that model context is now part of the attack surface.

Runtime content screening is a useful control, but it only addresses one layer of the stack: Screening inputs and outputs can reduce harmful content exposure, yet it does not answer who may call which tool, which resources should ever be reachable, or how much authority the model should have in the first place. That makes MCP governance a layered problem across identity, policy, and content inspection. The practitioner conclusion is straightforward: one decorator cannot replace a complete access model.

Tool-level security for AI systems now belongs in the same conversation as NHI governance: MCP servers behave like non-human identities with delegated authority, and that means their tool permissions, resource boundaries, and content handling should be governed with the same seriousness as service accounts or workloads. OWASP-NHI and zero trust principles are relevant because the model’s runtime behavior can no longer be assumed to be stable or benign. Teams should align AI tool access with explicit identity and resource boundaries.

From our research:
53% of MCP servers expose credentials through hard-coded values in configuration files, according to The State of MCP Server Security 2025.
Another finding from the same research shows that only 18% of MCP server deployments implement any form of access scoping for tool permissions, which helps explain why tool governance remains weak.
For broader identity context, see NHI Lifecycle Management Guide for how provisioning, rotation, and offboarding change when non-human identities are tied to runtime tools.

What this signals

Model context is now part of the control plane: MCP changes the way teams should think about identity security because the security boundary is no longer limited to authentication and authorization. When tool use is driven by live model context, governance has to account for content, not just credentials. That is why NHI programmes should start mapping MCP integrations alongside service accounts and workload identities, using NIST Cybersecurity Framework 2.0 as the outer governance model.

A useful operational concept here is context-to-action drift: the gap between what a model was asked to do and what it can be induced to do through untrusted text. Once that drift exists, permission reviews become less reliable because the risky action is not visible until execution time. The programme signal is simple. If your team cannot describe which model inputs are trusted, the AI integration is already under-governed.

The wider market signal is that AI security for MCP will converge with workload identity and NHI practices, not sit beside them as a separate discipline. With 53% of MCP servers exposing credentials through hard-coded values in configuration files, per The State of MCP Server Security 2025, the first layer of defence is still disciplined identity and secret handling before any advanced content policy is useful.

For practitioners

Separate tool, prompt, and resource controls Apply distinct validation rules to MCP tools, prompts, and resources so that each primitive is checked at the point of use, not only at the perimeter.
Add content screening to model-facing paths Screen inbound and outbound text for prompt injection, unsafe instructions, and data leakage before the model can act on the content.
Scope MCP permissions narrowly Limit each MCP server to the minimum tools and resources needed for its task, and review those permissions as part of identity governance.
Treat AI toolchains as governed non-human identities Document ownership, approval, and review for every MCP-enabled integration so delegated model behavior has a clear control owner.

Key takeaways

MCP security fails when teams treat model connectivity as the whole problem instead of governing what the model can do with context.
Prompt injection and hard-coded credentials show that the operational risk sits at the boundary between identity, content, and tool execution.
The practical response is layered governance: narrow tool scope, runtime content screening, and clear ownership for every AI integration.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	LLM-03	MCP tool use and prompt injection map directly to agentic AI input abuse.
OWASP Non-Human Identity Top 10	NHI-03	Hard-coded credentials in MCP config files are a direct NHI secret exposure issue.
NIST CSF 2.0	PR.AC-4	Least-privilege access scoping applies to MCP tool permissions and resources.

Inventory MCP secrets, rotate exposed credentials, and remove secrets from configuration files.

Key terms

Model Context Protocol: An open protocol that lets AI applications connect to tools, prompts, and external resources in a standard way. In practice, it expands what a model can reach at runtime, which makes access scoping, content validation, and tool governance part of the security design rather than optional add-ons.
Prompt Injection: A technique where hidden or malicious instructions inside content influence a model’s behaviour. The model may treat the injected text as legitimate context, causing it to ignore user intent, expose data, or take unsafe actions. For MCP systems, the issue is especially serious because the content often arrives through trusted integrations.
Content Screening: A runtime control that inspects input or output before content is allowed to influence a model or leave the system. It is useful for blocking unsafe instructions, leaks, and harmful text, but it does not replace access scoping or identity governance. It is one layer in a broader control stack.
Context-to-action Drift: The gap between the task a model was meant to perform and the action it can be pushed toward through untrusted context. The term is useful in MCP and agentic AI governance because the risk is not only bad input, but the way that input can change execution after access has already been granted.

What's in the full article

Lakera's full engineering guide covers the operational detail this post intentionally leaves for the source:

The exact Python decorator pattern used to screen MCP input and output before a tool response is returned.
Code-level examples for securing MCP tools, prompts, and resources separately in a working server.
The latency and implementation trade-offs of adding a guard layer to model-facing content flows.
The source gist referenced by the author for practitioners who want to test the pattern directly.

👉 Lakera’s full post shows the decorator pattern, example code, and MCP coverage for tools, prompts, and resources.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-08-28.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org