Subscribe to the Non-Human & AI Identity Journal
Home FAQ Governance, Ownership & Risk Who is accountable when a malicious MCP tool…
Governance, Ownership & Risk

Who is accountable when a malicious MCP tool exfiltrates data through an agent?

← Back to all FAQ
By NHI Mgmt Group Editorial Team Updated June 24, 2026 Domain: Governance, Ownership & Risk

Accountability sits with the team that allowed the agent to trust unreviewed tools and shared state in production. Security, platform and identity owners all need shared ownership, because the failure spans gateway governance, identity scoping, memory design and runtime monitoring. Frameworks such as OWASP Agentic AI Top 10 and NIST AI Risk Management Framework help structure that accountability.

Why This Matters for Security Teams

When an MCP tool exfiltrates data through an agent, the failure is not just a bad tool choice. It is a governance breakdown across the agent’s allowed toolset, the identity used to call those tools, and the trust placed in shared state or memory. Current guidance suggests teams should treat MCP like any other high-risk integration surface, not a convenience layer. That means reviewing tool provenance, scoping permissions tightly, and assuming the agent may chain actions in ways humans did not anticipate.

This matters because agentic systems do not behave like static service accounts. They act on goals, context, and tool responses, which makes post-incident blame less useful than pre-incident control design. The risk is amplified in environments that mix long-lived credentials, broad scopes, and unreviewed connectors. NIST’s NIST AI Risk Management Framework and the OWASP Agentic AI Top 10 both point toward shared accountability, runtime controls, and explicit ownership for agent behaviour.

NHIMG’s research on the OWASP NHI Top 10 reinforces the same pattern: identity and tool governance fail together. In practice, many security teams encounter exfiltration only after an agent has already used a trusted tool path to move data out, rather than through intentional design review.

How It Works in Practice

Accountability should be mapped across the full agent-to-tool chain. The product or platform team typically owns the agent design, the security team owns policy and monitoring, the identity team owns credential scope, and the data owner defines what may be exposed. No single function can carry the whole burden once an agent can invoke tools autonomously.

In practice, the right control model is layered:

  • Limit which MCP servers and tools the agent can discover, and require explicit approval for new integrations.
  • Use short-lived credentials and workload identity instead of shared static secrets, so tool access can be revoked quickly.
  • Apply request-time policy checks for each tool call, rather than relying only on pre-approved roles.
  • Log prompts, tool invocations, data accesses, and output destinations so investigations can reconstruct the path.
  • Separate memory, state, and retrieval domains so one compromised tool cannot silently broaden exposure.

The operational point is that an agent should not inherit broad trust just because it is embedded in a workflow. The CSA MAESTRO agentic AI threat modeling framework is useful here because it forces teams to examine orchestration, tool invocation, and abuse paths as first-class risks. NHIMG’s AI Agents: The New Attack Surface report shows why this matters: 80% of organisations say their AI agents have already performed actions beyond intended scope, including inappropriate data sharing and revealing access credentials.

That is why static IAM alone is insufficient. A malicious or compromised MCP tool can be used in ways that were never anticipated at design time, especially when the agent is allowed to plan, retry, or chain tool actions across systems. These controls tend to break down in shared enterprise agent platforms where multiple teams reuse the same memory store and connector catalogue because the blast radius becomes difficult to isolate.

Common Variations and Edge Cases

Tighter tool governance often increases operational friction, requiring organisations to balance developer speed against control assurance. That tradeoff is real, especially when teams want agents to move quickly across internal SaaS, data platforms, and code execution environments.

There is no universal standard for this yet, so current guidance suggests a few edge cases deserve special handling. Human-in-the-loop approval may still be appropriate for high-impact actions, but approval alone does not fix overbroad tool scopes. Likewise, vendor-managed MCP servers should still be treated as untrusted until their permissions, logging, and data handling are reviewed. A signed tool catalog is helpful, but signature trust does not remove the need for runtime policy.

For incident response, accountability should be documented before an event occurs. That means naming who can disable a tool, who can revoke the agent’s credentials, and who owns the data classification decision when the agent touches sensitive records. NHIMG’s Moltbook AI agent keys breach is a useful reminder that weak key hygiene turns an architectural issue into a disclosure event. These controls matter most where MCP tools are allowed to access shared production data, because the agent may exfiltrate through legitimate-looking requests that blend into normal traffic.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A03Covers agent tool abuse and unsafe autonomy leading to data exfiltration.
CSA MAESTROM1Addresses agent orchestration and tool-chain risk across shared control planes.
NIST AI RMFGOVERNDefines accountability and oversight for AI systems that can act autonomously.

Restrict tool permissions, review trust boundaries, and monitor every agent tool call at runtime.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on June 24, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org