TL;DR: Fine-grained authorization for RAG breaks down when every document, chunk, and embedding is synced into the auth layer, because high-cardinality resources create drift, latency, and bottlenecks in production, according to WorkOS. The scalable pattern is hierarchical authorization with local vector filtering, which keeps policy stable while the retrieval layer handles scale.
NHIMG editorial — based on content published by WorkOS: Authorization for RAG at Scale: Why You Shouldn't Sync Every Document
Questions worth separating out
Q: How should security teams implement authorization for RAG applications at scale?
A: Use hierarchical authorization for stable parent resources such as workspaces or collections, then apply local filtering in the vector store with parent metadata.
Q: What breaks when every document is synced to an external authorization system?
A: The auth layer becomes a bottleneck.
Q: When should organisations use document-level permissions in RAG?
A: Only when a document truly needs explicit control, such as shared files, special grants, or compliance holds.
Practitioner guidance
- Map entitlements to stable parent resources Assign access at the collection, workspace, project, or knowledge-base layer before exposing search results.
- Tag vector records with parent resource metadata Store a parent ID alongside each vector and use that field for local filtering in the retrieval layer.
- Separate policy writes from retrieval latency Design the auth call so it returns the accessible parent set once, then let the vector store handle high-volume filtering locally.
What's in the full article
WorkOS's full article covers the operational detail this post intentionally leaves for the source:
- The exact FGA query pattern used to fetch accessible collections before retrieval
- The vector database metadata filtering structure that keeps enforcement local
- The implementation trade-offs for cases that truly need document-level permissions
- The transition path from simple RBAC to hierarchical access as the application matures
👉 Read WorkOS's analysis of RAG authorization at scale and hierarchical filtering →
RAG authorization at scale: are your controls keeping up?
Explore further
High-cardinality authorization is the real RAG governance anti-pattern. The failure is not that teams lack an authorization product, but that they apply the wrong control grain to derived data. A document becomes dozens of chunks and thousands of vectors, so the policy layer inherits a cardinality problem it was never built to absorb. The practical conclusion is that access governance must sit above the retrieval substrate, not mirror it record for record.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
A question worth separating out:
Q: What is the difference between policy evaluation and vector filtering in RAG?
A: Policy evaluation decides which parent resources a user may access, while vector filtering applies that decision inside the retrieval system at query time. The first is the governance control, the second is the performance mechanism. Keeping them separate is what makes RAG authorization scale.
👉 Read our full editorial: RAG authorization at scale needs hierarchy, not per-document sync