What do teams get wrong about prompt injection in MCP environments?

Why This Matters for Security Teams

Prompt injection in MCP environments is not just a model-safety issue. MCP servers can turn manipulated text into tool calls, data access, or workflow execution, which means the real risk is unauthorized action. That is why NHIMG treats this as an identity and authorization problem as much as a content-filtering problem. The same design flaw shows up when teams assume the model can safely interpret instructions that were never meant to carry operational authority.

Current guidance from the OWASP Agentic AI Top 10 and NHIMG research on OWASP Agentic Applications Top 10 points to the same pattern: once an MCP integration can act on untrusted instructions, the attack surface shifts from prompt text to delegated authority. Teams often miss that the server, not the model, is what ultimately performs the risky action.

Astrix Security’s State of MCP Server Security 2025 found that only 18% of MCP server deployments implement any form of access scoping for tool permissions. That is the gap attackers exploit: the prompt becomes the delivery vehicle, but weak authorization is what makes the action possible. In practice, many security teams discover prompt injection only after an MCP server has already executed a tool call it should never have been allowed to make.

How It Works in Practice

The operational mistake is assuming prompt injection can be solved with filters, sanitization, or “safer” prompts alone. In MCP architectures, the server often sits between a model and sensitive systems, so the security question becomes: what identity initiated the request, what context justified it, and what tool scope was available at that moment? If those answers are not checked at runtime, malicious instructions can be treated like legitimate workflow input.

Secure implementations increasingly combine OWASP Agentic AI Top 10 guidance with workload identity and policy enforcement. That means the MCP server should not trust the prompt as an authority signal. Instead, it should validate the requester, bind the session to a narrow service identity, and evaluate tool permissions using policy-as-code at request time. This is where short-lived credentials matter: a task-scoped token is far safer than a standing secret that can be reused after a prompt injection lands.

Use workload identity for the MCP client or agent, not shared static secrets.

Scope tool permissions by task, tenant, and data domain.

Evaluate authorization at runtime, not only at deployment time.

Revise or revoke credentials when the task changes or completes.

Log the initiating identity, tool call, and policy decision together for investigation.

NHIMG’s analysis of Analysis of Claude Code Security reinforces a practical point: the more autonomous the workflow, the more dangerous it becomes to let free-form text drive privileged operations. These controls tend to break down when MCP servers are wired directly to high-trust systems because the server cannot reliably distinguish user intent from injected instructions once the session has broad standing access.

Common Variations and Edge Cases

Tighter prompt and tool controls often increase integration overhead, requiring organisations to balance safety against developer friction and runtime latency. That tradeoff becomes visible in fast-moving MCP environments where teams want broad tool access for experimentation but also expect production-grade containment. There is no universal standard for this yet, so current guidance suggests layering controls instead of relying on any single safeguard.

One common edge case is indirect prompt injection through retrieved content, tickets, or documents that the server later treats as instructions. Another is multi-step tool chaining, where each individual action looks harmless but the sequence produces privilege escalation or data exposure. In those cases, content moderation is insufficient because the harmful step is the authorization decision, not the language itself. Best practice is evolving toward explicit intent validation, least privilege, and separate trust boundaries between retrieval, reasoning, and execution.

The OWASP Top 10 for Agentic Applications 2026 is useful here because it frames prompt injection alongside broader agent failure modes, including tool misuse and unauthorized action. That broader framing matters in MCP deployments with multiple servers, because a weakness in one server can become a bridge into another system if identities, tokens, and policy boundaries are not isolated.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Prompt injection is a core agentic abuse path that can trigger unsafe tool use.
CSA MAESTRO	M1	MAESTRO addresses agent autonomy, tool access, and execution safety in MCP flows.
NIST AI RMF		AI RMF helps govern trustworthy behavior, accountability, and risk controls for MCP agents.

Treat prompts as untrusted input and gate tool execution with explicit runtime authorization.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do teams get wrong about prompt injection in MCP environments?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group