TL;DR: OpenClaw can be driven from indirect prompt injection into command execution, persistent heartbeat backdoors, and plaintext secret exfiltration when untrusted content reaches its tool layer and system prompt, according to HiddenLayer’s analysis. The bigger lesson is that autonomy without hard execution boundaries turns assistant behavior into an access-control problem, not just a model-safety issue.
NHIMG editorial — based on content published by HiddenLayer: Exploring the Security Risks of AI Assistants like OpenClaw
By the numbers:
- Only 5.7% of organisations have full visibility into their service accounts.
- 96% of organisations store secrets outside of secrets managers in vulnerable locations including code, config files, and CI/CD tools.
Questions worth separating out
A: The separation between input handling and execution breaks down.
Q: Why do autonomous assistants create more risk than ordinary automation for IAM and NHI teams?
A: Because the actor is making runtime decisions about what to do next, not just following a fixed workflow.
Q: What do security teams get wrong about prompt injection in agentic systems?
A: They often treat prompt injection as a model quality issue instead of an execution control issue.
Practitioner guidance
- Move tool authorization outside the model Require a separate policy decision before any shell command, file write, or external request is executed.
- Make prompt content immutable at runtime Prevent the assistant from writing to files that are later ingested into the system prompt or skill configuration.
- Isolate secrets from assistant-accessible storage Keep API keys and messaging tokens out of plaintext environment files that the assistant or its shell access can read.
What's in the full report
HiddenLayer's full research covers the operational detail this post intentionally leaves for the source:
- The exact indirect prompt injection sequence used to steer OpenClaw into executing attacker-controlled commands.
- The HEARTBEAT.md persistence mechanism and how it lets malicious instructions survive across new sessions.
- The security architecture failures around control sequences, guardrails, and approval-free tool execution.
- The plaintext secret exfiltration path and why the assistant's local runtime makes the blast radius larger.
👉 Read HiddenLayer’s analysis of OpenClaw’s autonomous assistant security risks →
OpenClaw and autonomous assistants: where governance breaks down?
Explore further
Autonomous assistants collapse the boundary between application logic and identity authority. OpenClaw is not only a model safety problem. It is a delegation problem in which the system itself is allowed to decide when to act, what to run, and what to persist. Once those decisions happen inside the same runtime that sees untrusted content, the assistant is behaving like an identity with execution authority, not a passive interface. Practitioners should treat that as a governance boundary, not a UI concern.
A few things that frame the scale:
- Only 5.7% of organisations have full visibility into their service accounts, according to the Ultimate Guide to NHIs.
- Another finding from the Ultimate Guide to NHIs , Lifecycle Processes for Managing NHIs shows that only 20% have formal processes for offboarding and revoking API keys, which is directly relevant when assistant access becomes persistent.
A question worth separating out:
Q: Who is accountable when an autonomous assistant exfiltrates secrets or runs destructive commands?
A: Accountability sits with the team that granted the assistant its tool access, data access, and execution paths. For governed environments, that responsibility also extends to the controls that failed to separate instruction content from runtime authority. If the agent can act without a control gate, the governance gap is structural.
👉 Read our full editorial: OpenClaw shows how agent autonomy becomes system exposure