Subscribe to the Non-Human & AI Identity Journal
Home FAQ Agentic AI & Autonomous Identity Why do MCP-based AI systems increase prompt injection…
Agentic AI & Autonomous Identity

Why do MCP-based AI systems increase prompt injection risk?

← Back to all FAQ
By NHI Mgmt Group Editorial Team Updated July 5, 2026 Domain: Agentic AI & Autonomous Identity

MCP-based AI systems increase prompt injection risk because they connect models directly to tools and external context that may contain hidden instructions. Once the model trusts that content, attackers can redirect behavior without breaking authentication. The risk rises when organizations assume access control alone is enough to protect model actions.

Why This Matters for Security Teams

MCP changes the attack surface because it gives an AI model structured access to tools, data sources, and execution paths that can carry hostile instructions. That makes prompt injection more than a content-filtering problem. It becomes an authorisation and trust-boundary problem, especially when the model is allowed to interpret retrieved text as operational guidance. Current guidance suggests security teams should treat tool connectivity as an active control plane, not a passive integration layer.

That distinction matters because prompt injection often arrives through normal business content, such as tickets, documents, logs, or web pages, where malicious instructions are embedded to influence agent behaviour. NHI Management Group has documented how fragile identity and secret handling can become once systems scale across tools, and the same pattern appears in agentic workflows. The Top 10 NHI Issues and the OWASP Agentic AI Top 10 both reflect this shift from static identity to runtime control.

In practice, many security teams encounter prompt injection only after an agent has already chained tools, exposed secrets, or taken an unintended action rather than through intentional testing.

How It Works in Practice

MCP-based systems increase risk because the model does not simply read content, it can act on it. If the agent can query a knowledge base, call a ticketing system, or invoke shell-like tools, then hostile instructions inside one of those inputs may become part of the model’s decision process. The issue is not limited to text prompts. Any retrieved content, metadata, or tool output can be used as an instruction channel if the system fails to separate data from commands.

Practitioners should assume that prompt injection emerges at the junction of context ingestion, tool selection, and delegated authority. Stronger designs usually combine several controls:

  • limit which tools the agent can reach for a given task
  • separate untrusted retrieved content from system instructions
  • apply policy checks before tool execution, not after
  • log tool calls and context sources for later review
  • treat secrets as short-lived and task-scoped, not durable session state

This is why static allowlists alone are not enough. An agent may be authorised to use a tool, but that does not mean every instruction inside the tool’s output should be trusted. The NIST Cybersecurity Framework 2.0 remains useful for governance, but agentic environments need runtime evaluation as well. For implementation guidance, the Analysis of Claude Code Security shows how quickly code-oriented agents can inherit hidden instructions from ordinary workflow inputs.

Best practice is evolving toward intent-aware controls where the system checks what the agent is trying to do, whether the request fits the current task, and whether the target tool is appropriate at that moment. These controls tend to break down when agents have broad recursive tool access and can consume large volumes of untrusted context in a single run because the model may amplify one malicious instruction across multiple actions.

Common Variations and Edge Cases

Tighter MCP controls often increase operational overhead, requiring organisations to balance lower injection risk against slower agent workflows and more complex policy maintenance.

There is no universal standard for this yet. Some teams restrict MCP servers to read-only use cases, while others allow write actions only after human approval. In high-trust internal environments, that may be enough for now, but it does not remove the core problem if the agent still consumes adversary-controlled text. The real tradeoff is between productivity and the ability to prove that the agent acted on validated intent rather than on hidden instructions.

Two edge cases deserve special attention. First, retrieval-heavy systems can inherit prompt injection from stale documents, poisoned notes, or copied incident data. Second, multi-agent pipelines can spread a single malicious instruction across planner, executor, and reviewer roles, which makes attribution harder and containment slower. The Ultimate Guide to NHIs — Key Challenges and Risks is relevant here because the same governance gap appears whenever identity, secrets, and runtime permissions are treated as separate problems.

Security teams should also watch for environments where MCP is paired with long-lived tokens or broad service accounts. In those deployments, prompt injection becomes more damaging because a single successful instruction can trigger durable access, not just one isolated action.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A10Prompt injection is a core agentic application risk addressed by OWASP.
CSA MAESTROT1MAESTRO covers trust boundaries and tool-use risks in agent workflows.
NIST AI RMFAI RMF supports governance for harmful or manipulated model behaviour.

Use AI RMF to establish monitoring, accountability, and risk treatment for agent actions.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org