Agentic AI memory poisoning exposes persistent governance gaps

By NHI Mgmt Group Editorial TeamPublished 2025-11-12Domain: Agentic AI & NHIsSource: Lakera

TL;DR: Memory poisoning and long-horizon goal hijacks let attackers persistently alter what AI agents remember and optimize for, creating silent compromise paths that unfold across sessions, according to Lakera. The core issue is that governance models still assume agent behaviour is short-lived, observable, and easy to review after the fact.

At a glance

What this is: This is an analysis of how memory poisoning and long-horizon goal hijacks can quietly steer AI agents across sessions by corrupting memory, context, and objectives.

Why it matters: It matters because IAM, IGA, and PAM programmes now have to account for AI agents whose effective privileges and intent can drift over time, not just at login or provisioning.

👉 Read Lakera's analysis of memory poisoning and long-horizon goal hijacks in AI agents

Context

Agentic AI memory poisoning is the tampering of an AI agent's persistent memory or knowledge base so later actions are influenced by attacker-supplied content. That matters for identity governance because the control problem is no longer only who can access the agent, but what persistent state the agent is allowed to trust over time.

The article argues that long-horizon goal hijacks extend the same pattern into the agent's objectives. For IAM, NHI, and AI governance teams, that means reviewing access controls in isolation is not enough when a system can carry corrupted context forward across sessions and workflows.

Key questions

Q: How should security teams govern AI agent memory that persists across sessions?

A: Treat persistent memory as governed state with ownership, provenance, and retention rules. Security teams should know where the data came from, how it was written, when it is reused, and who can revoke it. If memory can influence future actions, it sits inside the trust boundary and needs lifecycle controls, not informal review.

Q: Why do AI agents with long-term memory create more security risk than stateless chatbots?

A: Long-term memory lets an attacker persist influence beyond a single interaction. A stateless chatbot can be corrected in the moment, but a memory-enabled agent can carry poisoned context forward into later sessions, making the impact cumulative. That is why the control problem becomes persistence and reuse, not just prompt filtering.

Q: What breaks when an AI agent's objectives can be shifted over time?

A: What breaks is the assumption that each task starts from a clean, user-aligned intent. If the agent can be nudged into optimizing for something else over many steps, single-response checks will miss the failure. Teams need to evaluate the full workflow path, not only the final output.

Q: How do security teams detect memory poisoning in AI workflows?

A: Look for provenance gaps, unexpected retrieval sources, repeated topic drift, and outputs that reuse suspiciously stable wording across sessions. Detection improves when memory items are tagged with source and time, because poisoned entries become easier to isolate and invalidate. The goal is to spot abnormal reuse before it drives later actions.

Technical breakdown

How memory poisoning persists across agent sessions

Memory poisoning works when malicious content is written into an agent's long-term memory store, vector database, conversation history, or user profile. The agent later retrieves that data as if it were trusted context, so the attack does not depend on a single bad prompt. The real problem is persistence: poisoned memory can keep shaping outputs until it is discovered and removed. In identity terms, the stored record becomes a standing influence channel, even when no one is actively interacting with the attacker.

Practical implication: treat remembered context as a governed data source, not a passive cache.

Why long-horizon goal hijacks are different from prompt injection

Prompt injection usually aims at one interaction. Goal hijacking works more slowly by shifting what the agent optimises for across a longer sequence of tasks. The attacker does not need to fully control one response if they can gradually distort the decision path the agent follows. That makes the attack closer to behavioural drift than a one-shot exploit. For security teams, the technical challenge is that the compromised objective may still look plausible at each individual step.

Practical implication: monitor multi-step task progression, not just isolated prompts or outputs.

Layered guardrails for agent memory and retrieval

The defensive pattern in the article is layered control across input, memory, and output. Input filters block obvious malicious content, context filters sanitize retrieved memory before reuse, and output checks catch suspicious behaviour after generation. Provenance metadata also matters because it allows each memory item to be traced back to source, time, and trigger. This is the same basic logic as identity governance for other sensitive access paths: if you cannot explain where trusted state came from, you cannot trust the action that follows.

Practical implication: require provenance, filtering, and review controls at each reuse point in the agent workflow.

Threat narrative

Attacker objective: The attacker wants the agent to act on corrupted memory or shifted objectives so future work is quietly redirected, leaked, or manipulated without an obvious single-point failure.

Entry occurs when an attacker inserts malicious content into an agent's long-term memory or knowledge base through a poisoned interaction or compromised retrieval source.
Escalation happens when the agent later recalls that poisoned memory and treats it as trusted context, letting the attacker influence future decisions across sessions.
Impact follows when the agent's outputs, recommendations, or task priorities are persistently bent toward the attacker's objective instead of the user's intent.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Memory poisoning creates an identity trust problem, not just a content-safety problem. The core failure is that the agent begins to treat persistent memory as if it were a trusted part of its operating identity. Once attacker content enters that store, later decisions inherit the corruption across sessions. That means the governance boundary has shifted from prompt review to memory provenance, and practitioners should stop treating memory as an internal convenience layer.

Long-horizon goal hijacks expose a control gap in how organisations model agent intent. Traditional access models assume the requested action is visible at the point of decision, but agents can be steered gradually until their objective function no longer matches the user's. The practical implication is that decision authority and task continuity must be governed together, because the harmful change may not be obvious in any single step.

Persistent agent memory is a new form of identity blast radius. When one poisoned record can influence many future sessions, the exposure is no longer limited to the original interaction. That expands the effective blast radius from one request to an entire workflow history. Practitioners should treat every reusable memory store as a high-impact identity control surface.

OWASP-style injection thinking needs to be extended into lifecycle governance for agents. The article shows that the real risk is not only whether an attacker can inject content, but whether that content survives long enough to shape later behaviour. That turns agent memory into a lifecycle issue involving provenance, reset, and revocation. Security teams should align agent controls with lifecycle governance rather than one-time prompt hardening.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface.
A separate finding from the same research says 92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so.
That governance gap aligns with OWASP Agentic AI Top 10, which practitioners should use to harden memory, retrieval, and tool-use paths.

What this signals

Memory trust debt: persistent agent memory creates a trust obligation that grows every time the system reuses stored context. Once the memory layer becomes a decision input, teams need governance over provenance, retention, and revocation rather than relying on prompt-level checks alone.

The operational signal to watch is drift across time, not just a single malicious response. If an agent starts repeating unfamiliar phrasing, retrieving unexpected sources, or staying oddly aligned to one topic, the workflow may already be carrying poisoned state.

With 92% of organisations saying AI agent governance is critical but only 44% having policies in place, per AI Agents: The New Attack Surface, the control gap is already structural. Security leaders should prepare for memory provenance, retrieval review, and lifecycle revocation as standard agent controls.

For practitioners

Classify every persistent memory store as governed state Map long-term memory, vector stores, conversation logs, and profiles to explicit owners, data classes, and retention rules. If an agent can reuse it later, it needs the same provenance discipline as any other trusted input source.
Trace provenance on every reusable memory item Tag stored context with source, timestamp, and trigger so later reviews can separate user intent from attacker contamination. Without source tracing, you cannot distinguish legitimate continuity from poisoned persistence.
Monitor multi-step workflows for objective drift Review sequences of actions, not just final answers, to detect when an agent slowly begins optimizing for an unexpected outcome. Use drift indicators such as repeated topic shifts, anomalous retrieval patterns, or inconsistent task framing.
Reset or purge memory when trust is uncertain Build a revocation path for memory just as you would for credentials or sessions. If a memory source is questionable, remove it before the agent can reuse it in future tasks.

Key takeaways

Memory poisoning turns persistent agent state into a governance risk because corrupted memory can influence future sessions long after the original injection.
The article shows that AI agents already act beyond intended scope in 80% of organisations, which makes silent behavioural drift a live control problem rather than a theoretical one.
Practitioners need provenance, workflow monitoring, and memory revocation so they can govern what agents remember, reuse, and optimise for over time.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Memory poisoning and goal hijacks are core agentic AI attack patterns.
NIST AI RMF		Persistent agent behaviour requires governance over intent, monitoring, and accountability.
NIST CSF 2.0	PR.AC-4	Reusable memory acts like a sensitive access input that needs controlled trust and review.

Map agent memory and retrieval controls to access governance, provenance, and revocation processes.

Key terms

Memory Poisoning: Memory poisoning is the insertion of malicious or misleading content into an AI agent's persistent memory so it influences later decisions. The risk is not the single bad interaction, but the fact that the attacker can keep shaping future behaviour after the original injection has ended.
Long-Horizon Goal Hijack: A long-horizon goal hijack is a gradual manipulation of an AI agent's objective so that it optimises for an attacker-defined outcome over time. Unlike prompt injection, it works across multiple steps and sessions, making the failure look normal until the agent's behaviour has drifted far enough to matter.
Persistent Context: Persistent context is reusable information that an AI agent stores and recalls across sessions, such as memory records, vector embeddings, or user profiles. It improves continuity, but it also creates a standing trust boundary that must be governed, because any corrupted entry can keep influencing future actions.
Objective Drift: Objective drift is the gradual shift of an AI agent away from its original task or user intent. In practice, this can happen when corrupted memory, poisoned retrieval results, or manipulated inputs cause the system to optimise for the wrong outcome while still appearing functionally correct.

What's in the full article

Lakera's full article covers the operational detail this post intentionally leaves for the source:

Step-by-step examples of how memory poisoning and goal hijacks play out inside agent workflows.
Practical defense patterns for input filters, context filters, and output checks across memory reuse.
Research references and challenge examples, including Gandalf: Agent Breaker scenarios.
The vendor's walkthrough of how layered guardrails are applied in practice.

👉 Lakera's full post covers the attack examples, layered defenses, and challenge scenarios in more detail.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-11-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org