TL;DR: OpenClaw can be coerced into exfiltrating sensitive sandbox data through allowlisted tools such as git, gh, npm, and node, even when binary-scoped egress policies are enforced, according to Lasso Security research. The core failure is that static sandbox controls cannot evaluate agent intent, so trusted workflows become viable attack paths and persistence channels.
NHIMG editorial — based on content published by Lasso Security: Thinking Outside The Box, Exfiltrating OpenClaw Data from NVIDIA's new Sandbox
Questions worth separating out
Q: How should security teams stop AI agents from using approved tools to exfiltrate data?
A: Security teams should assume approved tools can be abused and apply task-scoped restrictions, behavioural monitoring, and strong separation between the agent and writable configuration state.
Q: Why do AI agent sandboxes still leak secrets even when egress policies are enforced?
A: Because egress policy answers where traffic may go, not whether the traffic represents legitimate task completion or covert theft.
Q: What do teams get wrong about sandboxing autonomous AI agents?
A: Teams often confuse containment with trust.
Practitioner guidance
- Map every approved agent tool to a data-exfiltration path Inventory which binaries, APIs, package managers, and source-control tools an agent can reach, then classify each one by the type of data it can move out of the environment.
- Protect agent configuration and instruction files as security assets Store prompts, memory files, policy files, and behaviour instructions outside the agent’s writable workspace, then monitor for unauthorised changes.
- Add runtime detection for intent-shaped misuse Alert on unusual combinations such as package installation followed by repository access, secret file reads followed by outbound posting, or repeated policy probing before exfiltration.
What's in the full report
Lasso Security's full research covers the operational detail this post intentionally leaves for the source:
- Step-by-step reproduction of the GitHub Pull Request attack and the emoji-based token reconstruction technique
- The persistent Agent Configuration Poisoning flow, including cron-based re-execution and SOUL.md tampering
- The exact OpenShell policy reconnaissance method used to map domain and binary combinations
- Proof-of-concept artefacts and example exfiltration paths that implementation teams can study in detail
👉 Read Lasso Security's research on AI agent sandbox exfiltration and policy poisoning →
AI agent sandboxes and exfiltration paths: are your controls enough?
Explore further
Trusted tool use is the new attack surface for AI agents. The article demonstrates that binary-scoped egress control can be functioning exactly as designed and still enable credential theft. That means the real problem is not destination control alone, but the assumption that authorised tools remain trustworthy when invoked by an autonomous agent. Practitioners should treat every approved integration as a potential exfiltration route, not a safe path by default.
A few things that frame the scale:
- 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: Who is accountable when an AI agent leaks secrets through permitted tools?
A: Accountability sits with the organisation that defined the agent’s permissions and operating model, because the abuse happens inside approved workflows. That means security, platform, and identity owners all need shared responsibility for tool selection, configuration integrity, and logging. In regulated environments, this becomes a governance and audit question as much as a technical one.
👉 Read our full editorial: AI agent sandboxing fails when trusted tools become attack paths