By NHI Mgmt Group Editorial TeamPublished 2025-12-01Domain: Best PracticesSource: Authzed

TL;DR: Building a production-style RAG pipeline with multi-tenant permissions depends on matching retrieval to relationship-based access, not just adding embeddings and vector search, according to Authzed. The identity lesson is that authorization must travel with the data path or the LLM will surface context the user should never see.


At a glance

What this is: This is a tutorial on building a retrieval-augmented generation pipeline with multi-tenant authorization, showing how SpiceDB-based permissions keep natural-language queries limited to data a user can access.

Why it matters: It matters because IAM, NHI, and AI platform teams need authorization controls that survive semantic search, background processing, and multi-tenant retrieval without leaking data across access boundaries.

By the numbers:

  • When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes and as quickly as 9 minutes in some cases.

👉 Read Authzed's tutorial on building a fine-grained authorized RAG pipeline


Context

RAG systems that answer natural-language questions are only as safe as the authorization model behind retrieval. When vector search is combined with multi-tenant data, the core risk is not model accuracy alone. It is whether the retrieval layer can reliably respect who is allowed to see which records before the LLM ever assembles a response.

For IAM teams, this is an authorization problem that spans application identity, workload identity, and data access governance. A system that chunks content, stores embeddings, and logs responses still needs a control plane that enforces relationships at query time, or semantic search becomes a path for overexposure rather than discovery.


Key questions

Q: How should security teams enforce authorization in RAG pipelines?

A: Security teams should enforce authorization before retrieval results reach the model. That means checking tenant or relationship permissions on each candidate chunk, not only on the initial API request. If the LLM receives unauthorized context, the exposure has already happened. The retrieval layer must be treated as part of the access-control boundary.

Q: Why do vector databases create governance risk in multi-tenant AI systems?

A: Vector databases create governance risk because semantic similarity is not the same as permission. A chunk can be highly relevant and still be off limits. If access control is not tied to the chunk’s tenant, source, or relationship metadata, the search layer can return content across boundaries and the model will happily generate from it.

Q: What breaks when metadata is stripped from RAG chunks?

A: When metadata is stripped, the system loses the link between a chunk and the tenant or object that owns it. That makes it impossible to apply fine-grained authorization consistently during retrieval and response generation. The result is either overblocking, which hurts usability, or overexposure, which creates a data leak.

Q: How do access controls differ between the API layer and the retrieval layer?

A: API-layer controls decide who may call the service, but retrieval-layer controls decide what data can be surfaced inside a response. In RAG, both matter. A user can be fully authenticated and still be unauthorized to see specific documents, so retrieval-time filtering must be enforced separately from login or request validation.


Technical breakdown

How relationship-based access control shapes retrieval

Relationship-based access control, or ReBAC, evaluates access through object relationships rather than just static roles. In this pattern, a user’s permissions depend on how they relate to an organization, farm, or dataset, and those permissions can inherit through object graph links. That matters for RAG because retrieval must filter candidate chunks before generation begins. If the vector database returns semantically relevant passages without checking those relationships, the model can assemble an answer from data the requester is not entitled to see. Practical implication: bind retrieval to authorization checks before context reaches the LLM.

Practical implication: enforce access checks before retrieval results are passed into the generation step.

Why metadata-driven chunking matters in multi-tenant search

Chunking improves retrieval precision by breaking larger documents into smaller semantic units, but it also creates a governance problem: every chunk inherits sensitivity from the source it came from. Metadata is what lets the pipeline keep provenance intact, so the system can tie an embedding back to a farm, tenant, or permission domain. Without that linkage, semantic similarity alone becomes dangerous because the search layer cannot distinguish between a relevant chunk and an authorized one. Practical implication: preserve tenant and source metadata through ingestion, indexing, and query execution.

Practical implication: preserve tenant and source metadata from ingestion through query execution.

Why event-driven orchestration changes the security model

An event-driven workflow decouples ingestion, embedding generation, retrieval, and logging. That improves latency and resilience, but it also means authorization cannot be treated as a one-time request gate. Background steps still process sensitive data, and each step becomes part of the trust boundary. In a RAG pipeline, the query may complete quickly while downstream tasks continue indexing, logging, or enriching records. If those steps are not individually constrained, a well-designed front door can still hide weak internal controls. Practical implication: treat every asynchronous step as an identity-enforced control point, not just the API entry point.

Practical implication: treat each asynchronous workflow step as a control point that needs explicit authorization.


Threat narrative

Attacker objective: The objective is to extract data from a multi-tenant RAG system by bypassing retrieval-time authorization and using the model to surface content outside the requester’s permissions.

  1. Entry occurs when a user submits a natural-language query into the RAG workflow and the system begins retrieving semantically related content from indexed farm records.
  2. Escalation occurs if the retrieval layer returns chunks based on similarity alone rather than relationship-based authorization, allowing the model to see data outside the user’s tenant or farm boundary.
  3. Impact is unauthorized disclosure through generated answers, where the LLM can expose harvest data, metadata, or cross-tenant context that should have remained inaccessible.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

Fine-grained authorization is now a retrieval requirement, not an application feature. RAG changes the security boundary because the model only sees what retrieval hands it. If authorization is evaluated after similarity search, the system has already crossed the boundary it was meant to protect. For practitioners, the control question is no longer whether the app authenticates the user, but whether every chunk is filtered by tenant and relationship before generation begins.

Metadata is the governance layer that makes semantic search safe. Vector embeddings are content-indexing primitives, not access decisions. The article’s farm and organization model shows the right pattern: preserve provenance so each record remains tied to the object graph that defines who may view it. That is the difference between searchable data and governable data. Practitioners should treat source metadata as part of the authorization model, not as optional enrichment.

RAG pipelines expose a control gap between fast answers and slow approvals. Event-driven orchestration improves user experience, but it also creates asynchronous processing paths where policy can be forgotten. If embedding jobs, query agents, and logging steps are not individually scoped, data can move through the pipeline faster than governance can inspect it. The implication is that security architecture must extend beyond the request handler and into every background step.

ReBAC is the right fit when data access follows organizational relationships. The farm, owner, editor, viewer, and organization model is a textbook case for relationship-based authorization because access is derived from how identities relate to resources. That makes the governance problem more precise than coarse role assignment. Practitioners should map their own RAG workloads to object relationships before deciding whether RBAC alone is enough.

Identity blast radius is the real metric for production RAG design. Once embeddings, queries, and logs are spread across services, a single over-permissioned path can expose more data than the application layer intends. This is why IAM, NHI governance, and data access controls have to be designed together. The practitioner conclusion is simple: if a retrieval path cannot prove its blast radius, it is not ready for production.

From our research:

  • The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
  • Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
  • That gap becomes more dangerous in RAG and AI workflows, so review the Guide to the Secret Sprawl Challenge for the operational patterns behind exposure and remediation.

What this signals

Fine-grained retrieval controls are becoming a baseline requirement for AI data products. As natural-language interfaces move closer to production workflows, teams should assume that semantic search will cross more sensitive boundaries unless retrieval is explicitly governed. The practical signal for identity teams is that authorization must be designed for data discovery, not just for application login.

Secret and access metadata should be treated as part of the data plane. When pipelines split, embed, cache, and regenerate content, the governance model has to travel with the record. That is where the control design meets identity reality, and it is why the NIST Cybersecurity Framework 2.0 and the OWASP Non-Human Identity Top 10 both matter for AI-enabled retrieval systems.

Identity blast radius becomes the metric to watch as RAG scales. With 6 distinct secrets manager instances reported in our research on secrets management fragmentation, the same fragmentation pressure can appear in retrieval pipelines when permissions, metadata, and service identities are managed separately. Teams should prepare for a control model that unifies access, provenance, and background execution before the next AI use case reaches production.


For practitioners

  • Enforce authorization at retrieval time Apply relationship-based checks before candidate chunks are passed to the LLM, so semantic relevance never overrides tenant boundaries. Retrieval should only return data already validated against the requester’s organization, farm, or equivalent access graph.
  • Preserve source metadata through the full pipeline Carry tenant, object, and source identifiers from ingestion into chunking, vector storage, and query response assembly. Without persistent metadata, the system cannot prove which records belong to which access domain.
  • Treat background jobs as governed identity paths Review embedding, enrichment, and logging workers as separate execution paths with their own permissions and service identities. Asynchronous processing must be scoped so one step cannot read or write beyond its declared purpose.
  • Test cross-tenant leakage with negative queries Run queries from users who should not see specific farms, records, or documents, then verify the retrieval layer returns nothing rather than adjacent context. Use these tests to confirm that authorization fails closed under semantic search.
  • Log access decisions separately from content logs Record which identity, relationship, and resource combination granted each query, then keep those records distinct from the user-facing audit trail. That lets teams investigate retrieval decisions without exposing sensitive content in the log itself.

Key takeaways

  • RAG systems fail when semantic retrieval is separated from authorization, because relevance alone cannot determine who is allowed to see a chunk of data.
  • The article shows that metadata, tenant relationships, and background workflow permissions are all part of the access model, not just the application layer.
  • Teams building production AI search should validate retrieval-time filtering, provenance retention, and negative-access tests before scaling to multi-tenant use.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Non-Human Identity Top 10NHI-03Covers secret and access control gaps that can expose RAG data paths.
NIST CSF 2.0PR.AC-4Access permissions must be enforced consistently across multi-tenant retrieval and processing.
NIST Zero Trust (SP 800-207)AC-3Zero Trust access decisions fit the need to verify every retrieval request independently.

Bind retrieval-time permissions to chunk provenance and enforce least privilege across AI workflows.


Key terms

  • Retrieval-Augmented Generation: A pattern that combines search with model generation so answers can be grounded in external data. The retrieval step supplies context to the model, which improves relevance but also creates a new security boundary that must be governed like any other data access path.
  • Relationship-Based Access Control: An authorization model that grants access based on how identities relate to resources, not only on roles. It is especially useful in multi-tenant systems where ownership, membership, and inheritance determine what data a user may see or modify.
  • Chunk Metadata: The identifying information attached to a piece of indexed content, such as tenant, source, owner, or object type. In RAG systems, metadata preserves provenance so retrieval can enforce access rules instead of relying on semantic similarity alone.
  • Identity Blast Radius: The amount of data and functionality that becomes exposed when one identity, permission path, or service account is over-privileged. In AI pipelines, it describes how far a single control failure can spread across retrieval, generation, logging, and downstream processing.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Authzed: Learn how to build a complete retrieval-augmented generation pipeline with multi-tenant authorization. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-12-01.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org