Zero-click MCP exploits are expanding the agentic IDE attack surface

By NHI Mgmt Group Editorial TeamPublished 2025-09-05Domain: Breaches & IncidentsSource: Lakera

TL;DR: A silently shared document, a Google Docs MCP integration, and allow-listed code execution can turn an agentic IDE into a zero-click path to remote code execution, credential theft, and persistence, according to Lakera; the core issue is not a patchable bug but trust assumptions that break when external content is allowed to drive agent actions.

At a glance

What this is: This is a research post showing how zero-click prompt injection can abuse MCP-connected agentic IDEs to execute malicious code and exfiltrate secrets.

Why it matters: It matters because security teams must treat agentic IDE integrations as an identity and execution boundary, not just a productivity feature, across NHI, autonomous, and human workflows.

👉 Read Lakera's research on zero-click MCP exploits and agentic IDE abuse

Context

Zero-click attacks against agentic IDEs happen when external content is allowed to influence tool use and code execution without a user click. In this case, the governance gap is not around model quality, but around what an AI coding assistant is trusted to fetch, interpret, and run once it is connected to MCP tools and allow-listed interpreters.

For identity teams, the issue sits at the intersection of NHI permissions, developer workstation trust, and agentic runtime behaviour. The same access decisions that once applied to a human developer session now apply to an assistant that can chain retrieval, tool use, and execution inside one workflow. That changes how privilege, approvals, and containment need to be designed.

The article’s starting point is typical of modern agentic development environments: productivity features are adopted first, then security assumptions are inherited from older tooling models. That makes the attack path broadly relevant, not an edge case.

Key questions

Q: How should security teams secure agentic IDEs that can fetch external documents and run code?

A: Treat retrieval, interpretation, and execution as separate trust steps. External content should be scanned before it reaches the model, and the model should not be able to move from a document to code execution without a deliberate approval boundary. The safest posture is to assume any connected document source can be attacker-controlled.

Q: Why do MCP-connected AI assistants increase the risk of credential theft?

A: Because they can combine document access, code execution, and local environment visibility in one session. If the assistant reaches secrets, tokens, or keys on the developer machine, an attacker can inherit those permissions through a single malicious interaction rather than a traditional malware chain.

Q: What breaks when allow-listed interpreters are available to AI coding assistants?

A: The allow-list becomes a hidden execution bridge. Once an interpreter is trusted automatically, any prompt injection or malicious retrieval that convinces the agent to use that interpreter can result in code running without a meaningful human checkpoint.

Q: Who is accountable when an AI assistant turns a document into remote code execution?

A: Accountability sits with the team that granted the assistant retrieval and execution privileges without enough containment. The relevant controls are governance over MCP integrations, command approval, and the secrets exposed in the developer environment, because those choices determine the blast radius.

Technical breakdown

How MCP expands the agentic IDE attack surface

Model Context Protocol gives an LLM a standard way to reach external tools and data sources. In an IDE, that means document retrieval, search, and action execution can all happen inside the same conversational loop. The security issue is that context brought in through MCP is often treated as ordinary source material, even when it is attacker-controlled. Once a malicious document enters the agent’s working context, the model may treat embedded instructions as operationally relevant. This is not a traditional exploit of the protocol itself. It is a trust boundary failure created by letting external content participate in the same decision flow as code generation and tool invocation.

Practical implication: treat every MCP-fed input as untrusted and separate retrieval trust from execution trust.

Why allow-listed interpreters become an execution bridge

Allow-lists are meant to reduce approval fatigue by letting trusted commands run without repeated confirmation. In practice, if Python or another interpreter is allow-listed, the agent can execute arbitrary code through a command path that looks routine to the user. That makes the allow-list itself a privilege boundary, not just a usability setting. The problem compounds when the code being run is fetched dynamically from a remote location, because the user sees a familiar toolchain while the actual payload remains hidden. The result is a delegated execution path that bypasses human review at the exact point where review matters most.

Practical implication: restrict interpreter allow-lists to the smallest possible set and require approval for remote script execution.

Why persistence and exfiltration are downstream identity failures

Once malicious code executes in a developer environment, the attacker is no longer just abusing prompt injection. They are harvesting secrets, tokens, SSH keys, and cloud credentials that inherit the developer’s access context. That turns a local compromise into a broader identity compromise across code repositories, cloud control planes, and internal services. The presence of a reverse shell or staged payload matters because it creates repeatable access, not just one-time theft. In identity terms, the initial weakness is the inability to distinguish normal developer action from adversary-driven execution once an agent has been granted both retrieval and execution authority.

Practical implication: assume workstation compromise can become identity compromise and design blast-radius limits accordingly.

Threat narrative

Attacker objective: The attacker aims to convert a normal document retrieval workflow into persistent remote access and credential theft across the victim’s broader enterprise environment.

Entry occurs when an attacker silently shares a malicious document into a Google Docs environment that is connected to an agentic IDE through MCP.
Credential access and execution follow when the agent retrieves the document, follows embedded instructions, and runs an allow-listed Python payload fetched from a public gist.
Impact arrives as the payload harvests secrets, establishes persistence, and creates a pivot point into cloud accounts, source code, and internal systems.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Zero-click MCP abuse is a governance failure, not a tooling bug. The article shows that the attack works through intended functionality, which means the control gap sits in how organisations govern external content, agent trust, and execution rights. Once an AI assistant can both retrieve untrusted material and act on it, the security model must treat that path as an identity boundary. Practitioners should stop assuming “safe” integrations remain safe when chained into agentic workflows.

Agentic IDEs create an identity blast radius that conventional developer controls do not model. The attacker is not only targeting the workstation. They are using the workstation to inherit cloud tokens, repo access, and authentication material already present in the developer’s session. That collapses the distinction between endpoint compromise and enterprise identity compromise. Practitioners need to view agentic development environments as high-value identity hubs, not just productivity endpoints.

External content trust was designed for human-paced review, not agent-paced execution. That assumption fails when an assistant can ingest a document, infer relevance, and execute code in one continuous session. The implication is not simply “add more scanning.” It is that the programme must rethink where human approval still exists in the chain and where it has already been displaced by delegated runtime action.

Prompt injection becomes materially worse when it can trigger real execution authority. In this scenario, the malicious text is only dangerous because it can steer a privileged workflow toward remote code execution and persistence. That makes the governance problem broader than content safety. Practitioners should align MCP governance, interpreter controls, and identity containment as one control plane, not three separate issues.

Identity blast radius is the right named concept for this pattern. A single malicious document can expand from content manipulation into secret harvesting, persistence, and enterprise pivoting because the agent inherits too much trust from the user session. That is a structural lesson for IAM, PAM, and workload identity teams. The practitioner conclusion is simple: reduce the amount of reusable trust any agentic workflow can inherit in one session.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation, according to AI Agents: The New Attack Surface report.
That same research shows governance is lagging deployment, so practitioners should pair agent monitoring with lifecycle controls before access expands further.

What this signals

Agentic IDE governance will increasingly sit inside identity programmes, not only application security. Once assistants can retrieve documents and execute code, the real question becomes how much trust a session can inherit before it becomes unsafe. Teams that already manage secrets, workload identity, and privileged access will be better placed to define those limits.

The practical shift is toward containment rather than confidence. Security leaders should expect more emphasis on tool approval, secret minimisation, and workflow segmentation as agentic development adoption widens. The teams that map these pathways now will be able to spot where a local assistant becomes a broader enterprise identity bridge.

For practitioners

Harden the MCP trust boundary Classify every MCP-fed source as untrusted until it passes automated screening for prompt injection, unsafe instructions, and hidden payload references. Treat document retrieval and code execution as separate trust decisions.
Remove broad interpreter allow-lists Do not allow a default path from agent instruction to arbitrary Python execution. Require explicit approval for remote code, and keep interpreter permissions narrow enough to prevent a single payload from becoming a reusable execution bridge.
Limit the secrets available inside agent sessions Reduce the number of cloud credentials, SSH keys, and tokens exposed in developer environments so a compromised assistant session cannot inherit full workstation trust. Segment local development secrets from production access wherever possible.
Restrict silent inbound file sharing Review document-sharing settings so attackers cannot flood users with searchable files that become retrievable through integrated assistants. Limit inbound sharing to trusted domains and monitor for abnormal document ingestion patterns.

Key takeaways

Zero-click MCP abuse shows that agentic IDEs can turn ordinary document retrieval into a remote execution path when trust boundaries are weak.
The evidence points to a full identity impact, not just a workstation issue, because harvested secrets can open cloud, code, and authentication pathways.
Practitioners should separate retrieval from execution, narrow interpreter permissions, and reduce the credentials exposed to agent sessions.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Covers prompt injection and tool misuse in agentic IDE workflows.
NIST CSF 2.0	PR.AC-4	Addresses access control and session trust for developer and agent workflows.
OWASP Non-Human Identity Top 10	NHI-03	Relevant because exposed secrets and reused interpreter privileges drive the breach path.

Map agent tool use to OWASP Agentic AI risks and block untrusted content before execution.

Key terms

Mcp: Model Context Protocol is a standard way for an AI system to connect to external tools and data sources. In practice, it can let an agent retrieve documents, inspect context, and trigger actions inside the same workflow, which makes the protocol a trust boundary that needs explicit governance.
Agentic ide: An agentic IDE is a development environment where an AI assistant can fetch context, suggest code, and execute actions with limited or no human intervention. That creates productivity gains, but it also collapses the distance between harmless assistance and privileged execution if controls are weak.
Prompt injection: Prompt injection is the use of hidden or malicious instructions inside content that an AI system processes. The goal is to steer the model into taking actions the user did not intend, especially when the system is trusted to read documents, summarise text, or call tools on behalf of the user.
Identity blast radius: Identity blast radius is the amount of access and downstream systems an attacker can reach after compromising one identity, session, or agent workflow. In agentic environments, the radius can expand quickly if the assistant inherits cloud tokens, local secrets, and execution rights in one chain.

What's in the full article

Lakera's full research covers the operational detail this post intentionally leaves for the source:

Step-by-step exploit walkthrough showing how the malicious Google Doc is retrieved and turned into execution
The specific Python payload behaviour used to harvest secrets and establish persistence
MITRE ATT&CK mapping that ties the chain to initial access, execution, persistence, and exfiltration
Defensive examples for hardening MCP integrations, allow-lists, and Google Workspace sharing settings

👉 Lakera's full post covers the exploit chain, payload design, and defensive hardening steps.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-05.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org