TL;DR: Building a production-style RAG pipeline with multi-tenant permissions depends on matching retrieval to relationship-based access, not just adding embeddings and vector search, according to Authzed. The identity lesson is that authorization must travel with the data path or the LLM will surface context the user should never see.
NHIMG editorial — based on content published by Authzed: Learn how to build a complete retrieval-augmented generation pipeline with multi-tenant authorization
By the numbers:
- When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.
Questions worth separating out
Q: How should security teams enforce authorization in RAG pipelines?
A: Security teams should enforce authorization before retrieval results reach the model.
Q: Why do vector databases create governance risk in multi-tenant AI systems?
A: Vector databases create governance risk because semantic similarity is not the same as permission.
Q: What breaks when metadata is stripped from RAG chunks?
A: When metadata is stripped, the system loses the link between a chunk and the tenant or object that owns it.
Practitioner guidance
- Enforce authorization at retrieval time Apply relationship-based checks before candidate chunks are passed to the LLM, so semantic relevance never overrides tenant boundaries.
- Preserve source metadata through the full pipeline Carry tenant, object, and source identifiers from ingestion into chunking, vector storage, and query response assembly.
- Treat background jobs as governed identity paths Review embedding, enrichment, and logging workers as separate execution paths with their own permissions and service identities.
What's in the full article
Authzed's full tutorial covers the operational detail this post intentionally leaves for the source:
- Step-by-step Motia workflow code for ingestion, embedding, query, and logging steps.
- Complete SpiceDB schema and setup scripts for organizations, farms, and user permissions.
- Full Pinecone configuration details, including index creation and embedding dimensions.
- Workbench testing sequence showing how authorization behaves across permitted and forbidden queries.
👉 Read Authzed's tutorial on building a fine-grained authorized RAG pipeline →
RAG pipelines and fine-grained authorization: what IAM teams need?
Explore further
Fine-grained authorization is now a retrieval requirement, not an application feature. RAG changes the security boundary because the model only sees what retrieval hands it. If authorization is evaluated after similarity search, the system has already crossed the boundary it was meant to protect. For practitioners, the control question is no longer whether the app authenticates the user, but whether every chunk is filtered by tenant and relationship before generation begins.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
A question worth separating out:
Q: How do access controls differ between the API layer and the retrieval layer?
A: API-layer controls decide who may call the service, but retrieval-layer controls decide what data can be surfaced inside a response. In RAG, both matter. A user can be fully authenticated and still be unauthorized to see specific documents, so retrieval-time filtering must be enforced separately from login or request validation.
👉 Read our full editorial: Fine-grained authorization for RAG pipelines needs identity controls