Claude Code prompt injection exposes hidden backdoor risks

By NHI Mgmt Group Editorial TeamPublished 2026-05-27Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: Claude Code’s ability to read code, fetch web pages, run commands, and call MCP tools creates a large prompt injection attack surface when permissions are skipped, according to Lasso Security. The real issue is not model capability but the collapse of trust boundaries around untrusted content and trusted execution.

At a glance

What this is: This is a Lasso Security analysis of indirect prompt injection in Claude Code, showing how untrusted content can steer an AI assistant into unsafe actions when elevated permissions and MCP access are combined.

Why it matters: It matters because security teams now have to govern AI agent behaviour, tool access, and trust boundaries together, not as separate problems from NHI, IAM, and application security.

By the numbers:

The defender includes 50+ regex patterns across all four injection categories.

👉 Read Lasso Security's analysis of prompt injection in Claude Code

Context

Indirect prompt injection is a trust-boundary failure, not just a model quality problem. In Claude Code, the assistant can consume repository files, web pages, issue comments, and MCP outputs as if they were legitimate instructions, which means a malicious string in untrusted content can influence trusted execution.

This is the kind of governance problem that sits between application security, NHI controls, and emerging agentic AI oversight. Once an assistant is allowed to read, decide, and act across tools with reduced human gating, teams are no longer managing a simple automation chain, they are managing an identity that can be manipulated through its context.

Key questions

Q: How should security teams govern AI assistants that can read untrusted content and execute tools?

A: They should govern the assistant as an action-capable identity, not as a passive chatbot. That means limiting approval-free execution, scanning tool output before it enters context, and mapping every connected source as a trust boundary. The key control question is whether untrusted content can influence privileged actions without a human checkpoint.

Q: Why do AI assistants with MCP access create a larger governance problem than standalone prompts?

A: MCP turns content ingestion into delegated access across multiple systems, so a poisoned source can influence both reasoning and action. The risk is not only bad instructions, but bad instructions moving through authenticated tool paths. Teams need to manage connector scope, data exposure, and action rights together.

Q: What do security teams get wrong about prompt injection defenses?

A: They often assume model resistance is enough, when the real weak point is the context pipeline. If malicious text can enter the model after a tool call, the assistant may act on it before the user notices. Detection helps, but governance has to reduce the amount of hostile content that reaches decision time.

Q: Who is accountable when an AI assistant executes an unsafe command after reading hostile content?

A: Accountability sits with the programme that granted the assistant its permissions, tool scopes, and deployment controls. The model did not self-authorise. Security, platform, and application owners all share responsibility for ensuring the assistant cannot convert untrusted input into privileged output without oversight.

Technical breakdown

How indirect prompt injection abuses trusted context

Indirect prompt injection works by hiding malicious instructions inside content the model is expected to process, such as README files, web pages, issue trackers, or MCP responses. The model cannot reliably distinguish user intent from embedded instructions when both arrive in the same context window. That matters most when the assistant has permission to execute shell commands or interact with connected systems, because the malicious instruction is no longer just text. It becomes an action trigger inside a trusted workflow.

Practical implication: treat every inbound tool result as hostile input until a guardrail has scanned it.

Why MCP expands the attack surface for AI agents

Model Context Protocol connections turn a chatbot into a multi-system operator, which is exactly why they become high-value trust boundaries. Each connected service, whether documentation, chat, code hosting, or data access, can carry content that the assistant will read and potentially act on. In this pattern, the AI is not just consuming information, it is traversing delegated access paths. That means compromise can begin through ordinary content and end in tool misuse, data exposure, or command execution.

Practical implication: inventory every MCP connection as a separate trust boundary with its own data and action scope.

What runtime detection changes in AI governance

A PostToolUse hook shifts defence from static policy to runtime inspection. Instead of relying on the model to spot suspicious content, the hook scans tool output after retrieval and before the model continues reasoning over it. Lasso Security’s approach uses pattern matching and severity flags to warn the model without blocking everything, which is a practical choice when false positives are unavoidable. The architectural point is that AI governance needs an interception layer between untrusted content and action-capable context.

Practical implication: add runtime inspection between tool output and model context instead of depending on prompt hygiene alone.

Threat narrative

Attacker objective: The attacker wants to turn a helpful coding assistant into an execution path that advances their instructions across code, data, or connected tools.

Entry occurs when an attacker plants malicious instructions in a repository file, web page, ticket, or MCP response that the assistant is likely to read.
Escalation follows when the assistant processes that content inside a privileged session and treats the embedded instruction as valid guidance for the next action.
Impact occurs when the assistant executes commands, moves data, or manipulates connected systems before a human reviews the chain of decisions.

Emerald Whale breach — exposed Git config files led to 15K secrets stolen and 10K repo compromises.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is an identity problem because the actor making decisions is no longer the human at the keyboard. The moment Claude Code can read untrusted content and act on it with reduced approval gates, the security boundary shifts from user intent to delegated execution. That makes trust, authorisation, and context integrity part of the identity control plane, not just prompt engineering. Practitioners should treat this as AI identity governance, not an isolated application bug.

Runtime authorisation assumptions fail when model context becomes attacker-controlled. Least privilege was designed for actors whose intent is known at provisioning time and whose requests are externally initiated. That assumption breaks when the assistant selects actions from instructions embedded mid-session in files, web pages, or MCP output. The implication is that governance models built around stable request patterns no longer describe how this actor behaves.

MCP trust boundaries now define the real blast radius of AI-assisted development. A connected assistant inherits every upstream content source and downstream action path as part of its operating perimeter. That makes exposed documentation, issue trackers, and connector outputs part of the identity risk surface. Security teams need to map these dependencies as a chain of delegated trust, not as isolated integrations.

Warn-and-continue controls are a pragmatic control pattern, but they do not solve authority leakage. Pattern detection can surface suspicious content at runtime and reduce silent compromise, yet it still leaves the assistant free to reason over hostile input. That is enough to raise operator awareness, but not enough to close the governance gap. Practitioners should view detection as a containment layer, not as proof that the trust model is sound.

Named concept: context-window trust debt. The more content sources an AI assistant is allowed to absorb, the more unreviewed instructions accumulate inside the same decision space. That debt grows when teams skip human checkpoints and expand tool access at the same time. The practical conclusion is that context expansion and action expansion must be governed together.

From our research:
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities.
Context sprawl is the next control problem to solve, as Ultimate Guide to NHIs , Key Challenges and Risks shows how visibility gaps and over-privilege compound exposure.

What this signals

Context-window trust debt: As AI assistants absorb more files, web pages, and connector output into a single decision space, security teams inherit more hidden instruction paths than their current review model can observe. With 1 in 4 organisations already investing in dedicated NHI security capabilities, according to The State of Non-Human Identity Security, the market signal is clear: AI governance is becoming an identity programme, not just an application feature.

Teams should expect runtime inspection, connector scoping, and action gating to become standard requirements for any assistant that can read and act across systems. The practical question is no longer whether prompt injection exists, but how far the organisation is willing to let untrusted content travel before it reaches a privileged action.

For practitioners

Classify every tool output as untrusted input Scan repository files, web fetches, issue text, and MCP responses before they enter the model context, and treat suspicious language as a security event rather than a prompt quality issue.
Reduce approval-free execution paths Avoid broad use of danger-skip permissions, and scope elevated access so the assistant cannot freely chain content ingestion into command execution across unrelated systems.
Map MCP connections as separate trust boundaries Document which services can supply content, which can trigger actions, and which can reach sensitive data, then review those paths as if they were distinct identities with delegated access.
Add runtime warning controls before action execution Place inspection hooks between tool output and the model’s next decision so suspicious content generates a visible warning before commands, edits, or data calls are completed.

Key takeaways

Indirect prompt injection turns untrusted content into a control-plane issue when an AI assistant can read, decide, and act across tools.
The core failure is not model weakness alone, but the combination of skipped approvals, broad connectors, and context that cannot distinguish instruction from data.
Security teams should govern AI assistants like delegated identities, with runtime inspection, tighter tool scope, and fewer approval-free execution paths.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection is a core agentic AI threat pattern in this article.
OWASP Non-Human Identity Top 10	NHI-02	The post focuses on delegated machine access and trust boundary abuse.
NIST CSF 2.0	PR.AC-4	The issue is over-broad access and weak privilege scoping across connected tools.

Restrict tool trust, validate inputs, and harden context boundaries before agents can act.

Key terms

Indirect Prompt Injection: A technique where malicious instructions are hidden inside content an AI system is expected to process, such as files, web pages, tickets, or tool output. The model reads the content as if it were normal input, which can cause it to follow attacker-controlled instructions during a privileged session.
Mcp Trust Boundary: The security boundary created by a Model Context Protocol connection between an AI assistant and an external system. Each connection can supply data, trigger actions, or both, so it must be governed like a delegated access path rather than a simple integration.
Posttooluse Hook: A runtime control that runs after a tool returns output but before the AI system continues reasoning over it. It is useful for scanning untrusted content, warning on suspicious patterns, and reducing the chance that hostile input is treated as trusted instruction.
Context-Window Trust Debt: The growing governance burden created when an AI assistant accumulates more unreviewed content inside the same decision space. As more sources are allowed into context, the chance that hidden instructions or misleading authority claims will influence action increases.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by Lasso Security: The Hidden Backdoor in Claude Code: Why Its Power Is Also Its Greatest Vulnerability. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-27.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org