TL;DR: Prompt Armor says Claude Cowork can be pushed into file exfiltration through indirect prompt injection and persistent isolation flaws in its code execution environment, allowing unauthorized uploads from local systems without human intervention. The case shows that AI agent security still hinges on trust boundaries, not just model behaviour.
NHIMG editorial — based on content published by ZioSec: Claude Cowork Vulnerability: Exfiltration Risks and Defensive Measures
By the numbers:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.
Questions worth separating out
Q: What breaks when prompt injection reaches an AI agent's file workflow?
A: The main failure is that a trusted file or Skill stops behaving like data and starts behaving like instruction.
Q: Why do AI agents create more exfiltration risk than ordinary automation?
A: AI agents can interpret untrusted content, choose actions at runtime, and combine tools in ways that static automation cannot.
Q: How can security teams tell whether agent file access is drifting out of policy?
A: Look for file uploads, API calls, and tool invocations that do not match the approved sequence for the agent's task.
Practitioner guidance
- Segregate untrusted content from execution Process uploaded files, Skills, and other external content in a low-trust parsing layer that cannot reach user files, secrets, or external upload endpoints.
- Constrain outbound egress from agent runtimes Block or tightly allowlist network destinations so a prompt-injected command cannot turn the runtime into a file transfer path.
- Audit agent API calls against approved workflow intent Compare upload, file access, and tool invocation events to the expected sequence for each agent so anomalous exfiltration patterns are visible.
What's in the full article
ZioSec's full article covers the operational detail this post intentionally leaves for the source:
- The specific file and Skill abuse patterns that were used to trigger prompt injection in Claude Cowork
- The observed command sequence that turns the agent into a file exfiltration path
- The defensive monitoring ideas for spotting unusual Anthropic API upload behaviour
- The article's summary of why the isolation flaw persisted despite prior acknowledgement
👉 Read ZioSec's analysis of Claude Cowork file exfiltration risk →
Claude Cowork file exfiltration: are agent isolation controls enough?
Explore further
Agent file exfiltration is an identity problem, not only a content-safety problem. The article shows a malicious file can become an execution trigger when the agent is trusted to process it with access to tools and data. That means the identity boundary has shifted from login to runtime behaviour, where delegated actions can be redirected without a new authentication event. Practitioners should treat file ingestion, tool invocation, and outbound transfer as one identity control plane, not three separate issues.
A few things that frame the scale:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: Who is accountable when an AI agent exfiltrates sensitive files through a hidden prompt?
A: Accountability sits with the teams that own the agent's runtime, data access, and workflow governance, because the failure is in control design rather than user intent alone. Frameworks that cover non-human identity governance and zero trust architecture are directly relevant when an agent can act on content without a human approval gate.
👉 Read our full editorial: Claude Cowork file exfiltration exposes AI agent isolation gaps