Subscribe to the Non-Human & AI Identity Journal

What do security teams get wrong about context isolation in GenAI apps?

Teams often treat context isolation as a data-handling issue only, but in GenAI apps it is also an identity boundary. Shared memory, embeddings, and session state can leak data and influence behaviour across users. Organisations should isolate memory per tenant or session and test for cross-session reuse before deployment.

Why This Matters for Security Teams

context isolation in GenAI apps is often misunderstood as a content filtering or data-loss problem, but it is really an identity and state boundary problem. When prompts, embeddings, conversation history, and retrieval results are shared across tenants or sessions, one user can indirectly influence another user’s output or expose sensitive material. NIST’s NIST AI 600-1 GenAI Profile treats these risks as governance and control issues, not just application tuning.

This matters because GenAI systems often blend retrieval, tool use, and long-lived memory in ways that traditional web app security models do not cover. A safe UI can still sit on top of a shared vector store, reused session object, or mis-scoped cache that leaks state between users. NHIMG’s DeepSeek breach coverage shows how quickly AI data exposure can become a broader trust failure once secrets or histories are embedded into the wrong layer. In practice, many security teams encounter context bleed only after a user reports unexpected output rather than through intentional boundary testing.

How It Works in Practice

Effective context isolation starts by treating every runtime boundary as separate: session state, retrieval scope, memory store, and tool permissions should all be tied to a tenant, user, or task. That means the application should not rely on a single shared conversation buffer or global embedding index unless access controls are enforced at query time. Current guidance suggests that context should be scoped the same way other sensitive resources are scoped: by identity, authorization, and purpose.

In practice, teams reduce leakage by combining short-lived session state, per-tenant vector namespaces, and explicit retrieval filters. If an agent or chat app uses memory, that memory should be indexed with tenant-aware metadata and checked on every read. If tool calls are involved, the model should only receive the minimum context needed for that step. The NIST AI 600-1 GenAI Profile is useful here because it frames the problem as risk management across the full AI lifecycle, not just prompt handling. NHIMG’s The State of Secrets in AppSec research also reinforces why this matters: once sensitive material enters shared application paths, remediation is slow and confidence in controls is often overstated.

  • Scope memory per tenant or session instead of using global conversational state.
  • Apply retrieval filters at query time, not only at ingestion time.
  • Separate embeddings, caches, and logs so they cannot cross user boundaries.
  • Test for cross-session reuse, prompt injection spillover, and stale memory exposure before release.

These controls tend to break down in multi-agent systems with shared orchestration layers because one agent can reintroduce another user’s context through tool output or cached intermediate state.

Common Variations and Edge Cases

Tighter isolation often increases latency, storage cost, and operational overhead, so organisations have to balance safety against performance and usability. That tradeoff is especially visible when product teams want “helpful memory” across sessions, while security teams want strict separation. There is no universal standard for this yet, but best practice is evolving toward explicit scoping and user-visible controls for what can persist.

Some deployments need hybrid approaches. For example, a support assistant may retain product knowledge globally while keeping customer-specific context private to each tenant. Similarly, batch summarisation pipelines may process many records at once, but the output store still needs per-record access checks. The DeepSeek breach illustrates how quickly a shared AI surface can expose more than intended when context is not cleanly partitioned. The practical lesson from NIST AI 600-1 GenAI Profile is to validate isolation continuously, not assume architecture alone is enough.

Teams also get tripped up by “anonymous” use cases, where there is no obvious logged-in user but there is still a session, device, or workflow identity that should define the boundary. In those environments, context isolation must be enforced at the application and storage layers, not left to the model.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 A2 Cross-session leakage is a prompt/context isolation failure in agentic apps.
CSA MAESTRO TA-02 MAESTRO addresses trust boundaries for agent memory, retrieval, and orchestration.
NIST AI RMF AI RMF governs lifecycle risk from shared context, leakage, and unintended influence.

Separate agent context per task and verify no shared memory or tool state crosses boundaries.