Subscribe to the Non-Human & AI Identity Journal

Vector Store Poisoning

The tampering of retrieval content or embeddings so a RAG system retrieves attacker-influenced context during normal operation. It turns the retrieval layer into a trust boundary that must be protected like any other sensitive data store.

Expanded Definition

Vector store poisoning is a retrieval-layer attack against retrieval-augmented generation systems, where an adversary alters stored chunks, metadata, or embeddings so the model retrieves attacker-influenced context during normal queries. No single standard governs this yet, and usage in the industry is still evolving, but the security pattern is clear: the vector database becomes a trust boundary, not just an index.

This differs from prompt injection because the malicious content is embedded earlier in the pipeline and can persist across sessions, agents, and users. It also differs from ordinary data quality issues, since the attacker is not merely corrupting relevance but steering model behavior, answer framing, or tool use. In a mature control model, the retrieval store should be treated with the same discipline used for secrets and identity data, including provenance, access control, and integrity checks. The NIST Cybersecurity Framework 2.0 is useful here because it frames data integrity, access control, and monitoring as continuous functions rather than one-time setup tasks. The most common misapplication is assuming retrieval content is safe because it was ingested from an internal source, which occurs when write access to the vector store is broader than the trust level of the downstream agent.

Examples and Use Cases

Implementing defences against vector store poisoning rigorously often introduces additional ingestion checks and operational friction, requiring organisations to weigh faster content updates against stronger provenance and review.

  • A support knowledge base is indexed for a customer service agent, but a compromised publishing pipeline injects misleading remediation steps that the agent later repeats.
  • An internal engineering assistant retrieves from a shared vector store, and an over-privileged contributor edits chunks so the agent recommends unsafe code paths.
  • A research RAG system ingests external documents, and poisoned embeddings cause the retriever to prefer attacker-authored material that looks semantically similar to trusted sources.
  • An autonomous agent connected to tools uses retrieved policy text to decide whether to approve a workflow, making poisoned context a direct path to unauthorized action.

These scenarios are especially important when the retrieval layer serves multiple applications or identities, because a single contaminated source can affect many downstream decisions. For broader context on identity governance and why retrieval infrastructure should be treated as sensitive, the Ultimate Guide to NHIs shows how non-human identity sprawl expands attack paths across systems. Design teams often pair that governance view with the NIST Cybersecurity Framework 2.0 to anchor ingestion controls, monitoring, and recovery.

Why It Matters in NHI Security

Vector store poisoning matters because retrieval systems are increasingly used by agents and automation that act with execution authority, not just by passive chat interfaces. If poisoned context influences a service account, AI agent, or workflow bot, the impact can spread into permissions, approvals, secrets handling, and downstream business logic. In practice, this is an identity problem as much as a content problem, because whoever can write to the retrieval layer can indirectly shape machine action.

The risk is amplified by weak visibility and excess privilege in non-human estates. NHI Mgmt Group notes that only 5.7% of organisations have full visibility into their service accounts, a gap that makes it difficult to see which identities can modify retrieval sources and which agents depend on them, as discussed in the Ultimate Guide to NHIs. That is why controls should align with least privilege, separate write and read paths, and continuous integrity monitoring, consistent with the NIST Cybersecurity Framework 2.0. Organisations typically encounter the issue only after an agent surfaces corrupted advice or executes a bad action, at which point vector store poisoning becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Non-Human Identity Top 10 NHI-02 Covers secret and data handling risks that enable poisoned retrieval stores.
OWASP Agentic AI Top 10 AGENT-04 Agentic systems can be steered by poisoned retrieved context before tool use.
NIST CSF 2.0 PR.AC-4 Least-privilege access is needed to protect retrieval data from tampering.

Restrict write access, verify provenance, and monitor vector store integrity continuously.