TL;DR: Memory poisoning and long-horizon goal hijacks let attackers persistently alter what AI agents remember and optimize for, creating silent compromise paths that unfold across sessions, according to Lakera. The core issue is that governance models still assume agent behaviour is short-lived, observable, and easy to review after the fact.
NHIMG editorial — based on content published by Lakera: Agentic AI threats, memory poisoning, and long-horizon goal hijacks
Questions worth separating out
Q: How should security teams govern AI agent memory that persists across sessions?
A: Treat persistent memory as governed state with ownership, provenance, and retention rules.
Q: Why do AI agents with long-term memory create more security risk than stateless chatbots?
A: Long-term memory lets an attacker persist influence beyond a single interaction.
Q: What breaks when an AI agent's objectives can be shifted over time?
A: What breaks is the assumption that each task starts from a clean, user-aligned intent.
Practitioner guidance
- Classify every persistent memory store as governed state Map long-term memory, vector stores, conversation logs, and profiles to explicit owners, data classes, and retention rules.
- Trace provenance on every reusable memory item Tag stored context with source, timestamp, and trigger so later reviews can separate user intent from attacker contamination.
- Monitor multi-step workflows for objective drift Review sequences of actions, not just final answers, to detect when an agent slowly begins optimizing for an unexpected outcome.
What's in the full article
Lakera's full article covers the operational detail this post intentionally leaves for the source:
- Step-by-step examples of how memory poisoning and goal hijacks play out inside agent workflows.
- Practical defense patterns for input filters, context filters, and output checks across memory reuse.
- Research references and challenge examples, including Gandalf: Agent Breaker scenarios.
- The vendor's walkthrough of how layered guardrails are applied in practice.
👉 Read Lakera's analysis of memory poisoning and long-horizon goal hijacks in AI agents →
Memory poisoning in AI agents: are your controls keeping up?
Explore further
Memory poisoning creates an identity trust problem, not just a content-safety problem. The core failure is that the agent begins to treat persistent memory as if it were a trusted part of its operating identity. Once attacker content enters that store, later decisions inherit the corruption across sessions. That means the governance boundary has shifted from prompt review to memory provenance, and practitioners should stop treating memory as an internal convenience layer.
A few things that frame the scale:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface.
- A separate finding from the same research says 92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so.
A question worth separating out:
Q: How do security teams detect memory poisoning in AI workflows?
A: Look for provenance gaps, unexpected retrieval sources, repeated topic drift, and outputs that reuse suspiciously stable wording across sessions. Detection improves when memory items are tagged with source and time, because poisoned entries become easier to isolate and invalidate. The goal is to spot abnormal reuse before it drives later actions.
👉 Read our full editorial: Agentic AI memory poisoning exposes persistent governance gaps