Notifications

Clear all

Memory poisoning in AI agents: what breaks when trust drifts?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12387

Topic starter 05/07/2026 6:48 pm

TL;DR: A controlled OpenClaw hackathon showed that persistent memory can shift an agent’s trust hierarchy over repeated Discord interactions until a system update request triggered reverse shell execution, according to Lakera. The finding shows that durable agent state can become a policy surface, not just a convenience layer.

NHIMG editorial — based on content published by Lakera: Memory Poisoning and Instruction Drift from Discord Chat to Reverse Shell

Questions worth separating out

Q: How should security teams govern AI agents that can remember user interactions across sessions?

A: Treat persistent memory as part of the security boundary, not as optional context.

Q: What breaks when an AI agent can change its trust decisions over time?

A: Static approval models break because the trust decision is no longer fixed at provisioning or first use.

Q: Why do shell-capable AI agents increase operational risk?

A: Shell-capable agents can turn a trust change into immediate system action.

Practitioner guidance

Separate policy memory from user-modifiable context Keep system-level instructions, trust rules, and execution policy outside any memory file or workspace that normal users can influence.
Sandbox every agent that can reach a shell Run agent processes in restricted environments with no administrative privileges by default.
Audit for trust drift in long-lived agents Inspect whether repeated interactions can elevate a user’s implied authority over time.

What's in the full report

Lakera's full research covers the operational detail this post intentionally leaves for the source:

Step-by-step breakdown of the OpenClaw memory architecture and where durable state was stored.
Controlled lab evidence showing how repeated Discord interactions changed the agent’s internal trust hierarchy.
Implementation details on the reverse shell payload used to verify the execution path.
The companion analysis on skill-marketplace risk and malware delivery through OpenClaw extensions.

👉 Read Lakera's analysis of memory poisoning and instruction drift in AI agents →

Memory poisoning in AI agents: what breaks when trust drifts?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 3 months ago

Posts: 11961

05/07/2026 7:06 pm

Persistent memory is not a convenience layer when it can alter agent authority. In this research, long-lived state did not merely store context, it influenced who the agent trusted and which instructions it prioritised. That makes memory integrity part of the security boundary, because policy can be rewritten indirectly through interaction. Practitioners should read this as a control-plane problem, not a prompt-quality problem.

A few things that frame the scale:

Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, according to The State of Secrets in AppSec.

A question worth separating out:

Q: Who should be accountable for agent memory and execution controls?

A: The team that owns the agent runtime should own both memory governance and execution controls, with clear sign-off from identity and security leadership. Accountability should cover who can influence durable state, who can approve high-risk tools, and who reviews evidence when the agent’s behaviour changes over time.

👉 Read our full editorial: Memory poisoning exposes a new trust boundary for AI agents

ReplyQuote

Forum Statistics

11 Forums

13.6 K Topics

26.1 K Posts

36 Online

135 Members

Latest Post: LLM security and AI-driven crime: what security teams must change Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies