What do security teams get wrong about AI memory and context?

Why This Matters for Security Teams

Security teams often frame AI memory as a convenience feature, when the real issue is governance. Vendor-held chat history, retrieval caches, and session context are not the same as a controlled knowledge layer with retention, access policy, and auditability. That distinction matters because context can influence decisions, prompt injection can corrupt it, and product migrations can strand it. The result is a false sense of continuity that masks operational fragility.

This is especially risky when memory becomes a shadow system for workflows that should sit under NIST Cybersecurity Framework 2.0 governance, or when teams assume a model can “remember” what the organisation has never formally stored. NHI guidance increasingly treats session state as an input, not a system of record, which is consistent with the failures highlighted in the DeepSeek breach, where exposed data showed how quickly AI-adjacent stores can become a security problem. In practice, many security teams discover the weakness only after a model change, a vendor outage, or a leaked context store has already broken a business process.

How It Works in Practice

The practical fix is to separate three layers that teams often blur together: transient model context, governed operational data, and authoritative records. Model memory should be treated as disposable working state. Business-critical facts, decisions, and approvals belong in a managed source of truth with ownership, retention rules, and access controls. That source of truth can then be fetched into the model at runtime, instead of being silently embedded in a vendor’s private memory layer.

That design aligns with current guidance from the NIST Cybersecurity Framework 2.0 and with NHI security thinking that treats identity, secrets, and data flow as separate control problems. It also helps teams avoid the common failure mode seen in the DeepSeek breach: assuming the platform will preserve context safely and indefinitely. Instead, teams should:

store durable knowledge in governed systems, not in chat history;

tag which memory fields are factual, derived, or user-supplied;

apply access controls to retrieval, not just to the application front end;

log when context is injected, changed, or deleted;

design for exportability so migrations do not destroy continuity.

For AI agents, this becomes even more important because the agent may chain tools, act autonomously, and reuse prior context in ways the operator did not anticipate. The emerging best practice is to evaluate runtime context against policy before the agent acts, rather than trusting whatever memory it currently holds. That is consistent with NIST Cybersecurity Framework 2.0 and the operational lessons visible in the DeepSeek breach. These controls tend to break down when teams rely on a single vendor’s chat memory as the only durable record across multiple tools and model versions.

Common Variations and Edge Cases

Tighter memory controls often increase integration effort, requiring organisations to balance user experience against auditability and portability. That tradeoff is real, especially in copilots, customer support bots, and agentic workflows where users expect the system to “just know” what happened before.

One edge case is ephemeral assistance, where storing memory is unnecessary or undesirable. In those environments, the safer pattern is no persistent memory at all, only short-lived retrieval from an approved knowledge base. Another is regulated workflows, where retention may be mandatory but access must still be constrained by role and purpose. Best practice is evolving here, and there is no universal standard for how much AI memory should be retained versus reconstructed.

The main mistake is assuming more memory always means better performance. In reality, noisy or stale context can be worse than no memory, especially after organisational changes, incident response actions, or model migrations. Teams that want resilience should treat context as governed input, not as a substitute for records management, and they should review memory design through the lens of NIST Cybersecurity Framework 2.0 as well as the control failures surfaced in the DeepSeek breach.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Memory stores often hide long-lived secrets and identity artifacts.
NIST CSF 2.0	PR.AC-4	AI memory access must be governed by least privilege and purpose.
NIST AI RMF		AI RMF covers governance, transparency, and lifecycle risk for model context.

Move persistent context out of vendor memory and rotate any secrets that can influence AI workflows.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do security teams get wrong about AI memory and context?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group