What breaks when MCP tools are treated as trusted by default?

Why This Matters for Security Teams

Trusted-by-default MCP design turns a protocol integration into a privilege boundary problem. When a tool is auto-accepted because it is registered, the real control question becomes whether the component is proven, scoped, and monitored before it can touch secrets or sensitive context. That is why current guidance around agentic systems increasingly treats tool trust as a runtime decision, not a setup-time assumption, as reflected in the OWASP Agentic AI Top 10.

For MCP specifically, the risk is not just malformed input. A poisoned server, a spoofed endpoint, or an overbroad connector can inherit access that was never intended for it, then shape outputs in ways that steer downstream actions. NHIMG has documented how mcp environment frequently fail at basic permission scoping, with only 18% of deployments implementing any form of access scoping for tool permissions in The State of MCP Server Security 2025. In practice, many security teams encounter unauthorized tool behaviour only after a workflow has already read secrets, modified records, or propagated bad data into production.

How It Works in Practice

The practical fix is to treat each MCP tool as an untrusted workload until it proves otherwise. That means separating registration from authorization, then enforcing policy at request time based on the tool’s identity, declared purpose, and current context. In agentic environments, static RBAC alone is too coarse because the same agent may invoke different tools with different risk profiles within the same session.

Security teams should prioritize three controls: provenance, scope, and runtime verification. Provenance confirms who published the server and whether the endpoint matches an expected source. Scope limits what the tool can read, write, or disclose, ideally through just-in-time access, short-lived tokens, and explicit allowlists. Runtime verification continuously checks behavior, including unexpected data access, tool chaining, or attempts to request broader privileges than the task requires. That approach aligns with the intent of the OWASP Top 10 for Agentic Applications 2026 and is reinforced by NHIMG analysis in Analysis of Claude Code Security, where tool access and code execution require continuous scrutiny rather than one-time approval.

Require signed or otherwise attestable tool provenance before first use.

Issue ephemeral credentials per task, not long-lived shared secrets.

Evaluate policy at invocation time, not only during onboarding.

Log tool inputs, outputs, and downstream side effects for review.

These controls tend to break down in multi-tenant agent platforms where connectors are reused across teams because trust decisions become inherited faster than they can be reviewed.

Common Variations and Edge Cases

Tighter tool governance often increases operational friction, so organisations must balance speed of integration against the blast radius of a compromised connector. That tradeoff is especially visible in developer productivity workflows, where teams want fast plug-in onboarding and may resist the extra verification steps required for high-risk tools.

There is no universal standard for MCP trust decisions yet, so best practice is evolving. Some environments will rely on gateway-level mediation, while others will prefer per-tool policy engines or identity-bound attestation. The important point is that “registered” should never mean “implicitly trusted.” If a tool can reach secrets, trigger actions, or alter context, it needs the same scrutiny as any privileged workload.

Edge cases appear when an mcp server is internally hosted but still built from third-party code, when an agent chains multiple tools across trust zones, or when observability is weak enough that misuse looks like normal task completion. In those environments, default trust creates a hidden escalation path, especially where the Astrix Security findings on secret exposure and access scoping gaps overlap with broad agent permissions. The safer pattern is to assume every tool can be manipulated until the platform proves otherwise.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Covers tool trust failures and agent misuse through unsafe integrations.
CSA MAESTRO	T1	Addresses agent and tool trust boundaries in autonomous workflows.
NIST AI RMF		Supports governance for dynamic AI behaviour and runtime risk management.

Treat every MCP tool as untrusted until policy, provenance, and scope checks pass at runtime.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What breaks when MCP tools are treated as trusted by default?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group