NHI Forum
Read full article here: https://natoma.ai/blog/understanding-model-context-protocol-vulnerabilities-tool-poisoning-attacks/?utm_source=nhimg
The Model Context Protocol (MCP) is rapidly becoming the backbone of secure, scalable AI agent architectures — but its popularity also makes it a tempting target. Like any emerging technology, MCP brings new risks that need to be addressed early, before attackers can exploit them at scale.
This article kicks off a new series on MCP security. First up: Tool Poisoning Attacks (TPAs) — a novel vulnerability class that exploits how large language models (LLMs) interpret tool metadata.
How Tool Poisoning Works
Invariant Labs recently uncovered a new form of indirect prompt injection that hides malicious instructions in tool descriptions. Since LLMs “see” full tool metadata — not just the user-friendly name you’re shown — they can be manipulated to:
- Misuse legitimate tools — for example, calling “delete file” instead of “read file.”
- Prioritize unsafe tools — pushing the model toward weaker, attacker-controlled functions.
- Act on hidden commands — even if the tool is never explicitly invoked by the user.
Because MCP servers are often downloaded and run locally, a poisoned tool can silently execute malicious actions without users realizing they’ve been manipulated.
Risks and Real-World Impact
The danger goes beyond theoretical misuse. Tool Poisoning could allow attackers to:
- Trigger unauthorized system actions (data deletion, modification).
- Bypass security checks by hijacking prioritization logic.
- Launch cascading attacks by chaining multiple malicious tool calls.
The most concerning part? This is invisible to the user. The LLM is simply “doing its job” based on context — but the context has been tampered with.
Defense-in-Depth for MCP Environments
Protecting against TPAs requires layered security:
- Clear, unambiguous tool metadata – remove ambiguity that models can misinterpret.
- Context validation and runtime policy checks – ensure tool use aligns with intent before execution.
- Granular access controls – enforce least privilege at the tool level, not just at the app level.
- Anomaly detection – monitor for suspicious tool invocation patterns.
- Human oversight for high-impact actions – a final check for destructive or privileged operations.
The Bigger Picture: AI + NHI Security
Tool Poisoning highlights a new class of risk: machine-to-machine trust abuse. These tools, functions, and MCP servers are effectively non-human identities (NHIs) — they have permissions, lifecycles, and attack surfaces just like service accounts or API keys.
Securing them means applying the same Zero Trust principles we use for human users: discover them, govern them, monitor them, and decommission them when they’re no longer needed.
What’s Next
This is just the first in a series unpacking MCP risks. Next, we’ll explore Tool Hijacking — where attackers take over legitimate tools mid-execution — and how to design MCP-based systems that are resilient against adversarial manipulation.