What do security teams get wrong about MCP and skill safety?

Why This Matters for Security Teams

Structured tool mediation is useful, but it is not a safety boundary by itself. MCP can constrain how an agent discovers or invokes tools, yet it does not neutralise a hostile skill that smuggles in shell commands, hidden instructions, or bundled scripts that execute outside the protocol path. That is why current guidance treats MCP as a governance layer, not a complete trust model. The real risk is that a skill can still shape what the agent does once execution authority exists.

This distinction shows up in the broader agentic AI threat landscape described by the OWASP Agentic Applications Top 10 and the OWASP Top 10 for Agentic Applications 2026, where tool misuse and indirect prompt control remain central concerns. NHIMG’s AI Agents: The New Attack Surface report shows why this matters operationally: 80% of organisations report AI agents have already performed actions beyond their intended scope, including unauthorized systems access, sensitive data sharing, and credential exposure. In practice, many security teams encounter skill abuse only after an agent has already executed the unwanted action, rather than through deliberate testing of the skill boundary.

How It Works in Practice

Safe MCP design starts with a narrow assumption: the protocol can help mediate approved tool calls, but the skill itself still needs to be treated like untrusted code with possible local execution paths. That means teams should review every skill for more than declared tool use. They should inspect embedded scripts, package dependencies, post-install hooks, file-system access, and any mechanism that can trigger shell execution or outbound network activity. The question is not just “What tools does MCP expose?” but “What can this skill do if it is malicious, compromised, or socially engineered by the prompt it receives?”

Practically, that leads to layered controls. Skills should run with least privilege, inside sandboxed environments, with explicit filesystem and network restrictions. Tool permissions should be scoped per task, and secrets should never be broadly available to the runtime. When an agent does need privileged access, use just-in-time issuance and short-lived tokens rather than static credentials. For identity, treat the workload identity as the primitive, not a human-like account. Standards work such as the OWASP Agentic AI Top 10 supports this direction, while implementation guidance increasingly points toward runtime policy checks and ephemeral authorization.

Review skills for hidden execution paths, not just declared MCP tools.

Block shell access unless the use case explicitly requires it and is monitored.

Issue short-lived secrets per task, then revoke them on completion.

Separate tool mediation from execution control and from data access.

NHIMG’s The State of MCP Server Security 2025 notes that only 18% of MCP server deployments implement any form of access scoping for tool permissions, which helps explain why protocol-level controls alone are often overstated. These controls tend to break down in developer workstations and self-hosted agent runners because local execution paths are easier to reach than the MCP boundary itself.

Common Variations and Edge Cases

Tighter skill controls often increase operational overhead, requiring organisations to balance developer velocity against execution safety. That tradeoff is real, especially when skills are custom-built, loaded dynamically, or updated through package registries. Current guidance suggests treating third-party skills as supply-chain inputs, but there is no universal standard for how deeply every skill should be inspected yet.

The hardest edge cases are the ones where MCP is present but irrelevant to the actual compromise path. A skill may call a local script, abuse a connector outside MCP, or persuade an agent to reveal data that the protocol never directly exposed. This is why “protocol compliant” should never be confused with “safe.” The safer operating model is to assume any skill may attempt direct execution unless the environment explicitly blocks it.

Security teams also need to distinguish between governance and containment. MCP can help with visibility and routing, but containment depends on runtime controls such as sandboxing, egress restrictions, policy-as-code, and ephemeral credentials. NHIMG’s Analysis of Claude Code Security is a useful reminder that code-aware agents need guardrails that extend beyond the protocol surface. The practical failure mode is environments where tool mediation exists, but local execution, secrets exposure, and network access remain unrestricted.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A5	Skill abuse and hidden execution map to agent tool and prompt threats.
CSA MAESTRO	MAESTRO-3	Focuses on agent tool governance and execution boundaries.
NIST AI RMF	GOVERN	Runtime governance is needed when agents can act beyond intended scope.

Review every skill for indirect execution paths and restrict untrusted tool behavior at runtime.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do security teams get wrong about MCP and skill safety?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group