TL;DR: A controlled OpenClaw hackathon showed that persistent memory can shift an agent’s trust hierarchy over repeated Discord interactions until a system update request triggered reverse shell execution, according to Lakera. The finding shows that durable agent state can become a policy surface, not just a convenience layer.
NHIMG editorial — based on content published by Lakera: Memory Poisoning and Instruction Drift from Discord Chat to Reverse Shell
Questions worth separating out
Q: How should security teams govern AI agents that can remember user interactions across sessions?
A: Treat persistent memory as part of the security boundary, not as optional context.
Q: What breaks when an AI agent can change its trust decisions over time?
A: Static approval models break because the trust decision is no longer fixed at provisioning or first use.
Q: Why do shell-capable AI agents increase operational risk?
A: Shell-capable agents can turn a trust change into immediate system action.
Practitioner guidance
- Separate policy memory from user-modifiable context Keep system-level instructions, trust rules, and execution policy outside any memory file or workspace that normal users can influence.
- Sandbox every agent that can reach a shell Run agent processes in restricted environments with no administrative privileges by default.
- Audit for trust drift in long-lived agents Inspect whether repeated interactions can elevate a user’s implied authority over time.
What's in the full report
Lakera's full research covers the operational detail this post intentionally leaves for the source:
- Step-by-step breakdown of the OpenClaw memory architecture and where durable state was stored.
- Controlled lab evidence showing how repeated Discord interactions changed the agent’s internal trust hierarchy.
- Implementation details on the reverse shell payload used to verify the execution path.
- The companion analysis on skill-marketplace risk and malware delivery through OpenClaw extensions.
👉 Read Lakera's analysis of memory poisoning and instruction drift in AI agents →
Memory poisoning in AI agents: what breaks when trust drifts?
Explore further
Persistent memory is not a convenience layer when it can alter agent authority. In this research, long-lived state did not merely store context, it influenced who the agent trusted and which instructions it prioritised. That makes memory integrity part of the security boundary, because policy can be rewritten indirectly through interaction. Practitioners should read this as a control-plane problem, not a prompt-quality problem.
A few things that frame the scale:
- Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, according to The State of Secrets in AppSec.
A question worth separating out:
Q: Who should be accountable for agent memory and execution controls?
A: The team that owns the agent runtime should own both memory governance and execution controls, with clear sign-off from identity and security leadership. Accountability should cover who can influence durable state, who can approve high-risk tools, and who reviews evidence when the agent’s behaviour changes over time.
👉 Read our full editorial: Memory poisoning exposes a new trust boundary for AI agents