TL;DR: Fine-grained authorization for RAG breaks down when every document, chunk, and embedding is synced into the auth layer, because high-cardinality resources create drift, latency, and bottlenecks in production, according to WorkOS. The scalable pattern is hierarchical authorization with local vector filtering, which keeps policy stable while the retrieval layer handles scale.
At a glance
What this is: This is a practical analysis of why per-document authorization does not scale for RAG, and the key finding is that hierarchical auth plus local filtering is the workable pattern.
Why it matters: It matters because IAM, NHI, and autonomous governance teams all face the same issue when high-cardinality resources are tied to policy evaluation, and the control plane has to stay smaller than the data plane.
👉 Read WorkOS's analysis of RAG authorization at scale and hierarchical filtering
Context
RAG authorization fails when teams treat every chunk, vector, or document as a first-class policy object. That model works in demos, but in production it creates a large, fragile authorization surface where each ingestion, deletion, and query adds policy overhead that the retrieval path cannot absorb.
The underlying identity problem is not search accuracy, it is access scope. Users usually inherit rights from stable containers such as workspaces, projects, or collections, so the governance question becomes how to keep authorization stable while the retrieval layer handles millions of high-cardinality records.
Key questions
Q: How should security teams implement authorization for RAG applications at scale?
A: Use hierarchical authorization for stable parent resources such as workspaces or collections, then apply local filtering in the vector store with parent metadata. That keeps policy decisions small and auditable while avoiding the operational cost of syncing every derived chunk or embedding into the auth layer.
Q: What breaks when every document is synced to an external authorization system?
A: The auth layer becomes a bottleneck. Every ingestion, deletion, and query creates policy writes, cleanup work, and latency, which leads to drift between the vector store and the authorization source of truth. At scale, the system slows down and becomes harder to keep consistent.
Q: When should organisations use document-level permissions in RAG?
A: Only when a document truly needs explicit control, such as shared files, special grants, or compliance holds. If document-level access becomes the default, the model becomes difficult to operate and audit, and it no longer reflects the normal inheritance pattern users actually follow.
Q: What is the difference between policy evaluation and vector filtering in RAG?
A: Policy evaluation decides which parent resources a user may access, while vector filtering applies that decision inside the retrieval system at query time. The first is the governance control, the second is the performance mechanism. Keeping them separate is what makes RAG authorization scale.
Technical breakdown
Why high-cardinality resources break fine-grained authorization
Fine-grained authorization systems work best when the number of protected objects stays relatively stable. RAG reverses that assumption because a single source document can turn into many chunks and embeddings, each of which can multiply policy writes, cleanup tasks, and lookup latency. Once the auth system is asked to track every derived object, it stops being a policy layer and starts acting like a processing bottleneck. The failure mode is not just scale, but drift between the source of truth and the vector store as ingestion and deletion rates increase.
Practical implication: keep the authorization graph smaller than the retrieval corpus, or policy evaluation will become the slowest and least reliable part of the pipeline.
How hierarchical authorization preserves RAG performance
Hierarchical authorization shifts the unit of control from individual documents to stable parent resources such as collections, workspaces, or knowledge bases. Access is checked once against the parent, and the vector store filters locally using metadata that points back to that parent ID. This keeps policy evaluation deterministic while letting the retrieval layer do the high-volume work it was built for. The architectural benefit is that policy changes happen at a human-comprehensible level, while the search system remains free to index and serve large corpora efficiently.
Practical implication: assign permissions at the collection or workspace layer, then use metadata filters in the vector database to enforce inherited access.
When document-level access should remain the exception
Some RAG workloads do require document-level control, especially when shared files, explicit grants, or compliance holds create exceptions to the normal hierarchy. The key design choice is to treat those cases as a small override path rather than the default architecture. If every document gets its own policy record, the authorization model becomes difficult to operate, harder to audit, and much more likely to diverge from the search index. Good RAG governance keeps exceptions narrow and intentional.
Practical implication: reserve direct document registration for true exceptions, and keep the bulk of the corpus governed through inheritance.
Breaches seen in the wild
- DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
- Schneider Electric credentials breach — exposed credentials gave attackers access to Schneider Electric Jira, exfiltrating 40GB.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
High-cardinality authorization is the real RAG governance anti-pattern. The failure is not that teams lack an authorization product, but that they apply the wrong control grain to derived data. A document becomes dozens of chunks and thousands of vectors, so the policy layer inherits a cardinality problem it was never built to absorb. The practical conclusion is that access governance must sit above the retrieval substrate, not mirror it record for record.
Identity scope should be anchored to stable containers, not ephemeral search artifacts. Collections, workspaces, and projects are the durable units of entitlement, while chunks and embeddings are transient implementation detail. That distinction matters because access review, revocation, and audit work only when the protected object has a stable lifecycle. The field should stop treating vector-level policy as the default and reserve it for genuine exceptions.
Local filtering is the correct split between policy and performance. Authorization decides what a user may reach, while the vector store decides how to filter at speed once that scope is known. This separation aligns with NIST Cybersecurity Framework 2.0 thinking because the control function stays governed while the data function stays efficient. Practitioners should design for a small trusted policy surface and a large, optimized retrieval surface.
Document-level authorization should be treated as a governance exception, not an architectural baseline. When everything is special, nothing is governable. The scalable model is hierarchical entitlement with narrowly scoped overrides for shared documents or compliance holds. That gives teams a policy model they can audit and a retrieval model they can actually operate at enterprise scale.
From our research:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
- The broader lesson on identity-bound controls and scale is also explored in Ultimate Guide to NHIs , Why NHI Security Matters Now.
What this signals
High-cardinality policy surfaces should be treated as a design smell. When the number of protected objects grows faster than the governance model, teams should move entitlement upward to a stable container and keep enforcement local. That pattern is becoming the default for RAG, but it also mirrors how mature NHI programmes separate durable identity from transient runtime objects.
A useful way to think about this is the identity blast radius: the larger the number of derived resources tied to a single policy decision, the more fragile revocation and audit become. In enterprise programmes, that same problem shows up when machine identities are over-bound to ephemeral assets, which is why access scope and resource hierarchy need to be designed together.
The operational signal to watch is whether your retrieval layer can keep serving while the policy layer stays small and stable. If every ingestion event forces an auth write, the architecture is already telling you the control plane is too close to the data plane for reliable governance.
For practitioners
- Map entitlements to stable parent resources Assign access at the collection, workspace, project, or knowledge-base layer before exposing search results. Keep individual document registration for special cases only, so authorization stays aligned with the actual governance boundary and does not grow with every chunk or embedding.
- Tag vector records with parent resource metadata Store a parent ID alongside each vector and use that field for local filtering in the retrieval layer. This keeps policy checks small and lets the database enforce inherited access without pushing every query through the auth service.
- Separate policy writes from retrieval latency Design the auth call so it returns the accessible parent set once, then let the vector store handle high-volume filtering locally. That split prevents network delay and auth-layer load from sitting directly on the user search path.
- Reserve per-document auth for exception workflows Use direct document entitlements only for explicit access grants, shared files, or compliance holds that cannot be modeled through inheritance. Track those exceptions separately so they remain auditable and do not become the default pattern for the corpus.
Key takeaways
- RAG authorization fails when teams try to govern every derived chunk and vector as if it were a primary asset.
- The practical scaling pattern is hierarchical access at the parent resource level, with local metadata filtering doing the heavy lifting.
- Document-level permissions should stay exceptional, or the authorization system will become the bottleneck the RAG layer was meant to avoid.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | PR.AC-4 | Access permissions must stay aligned to the real governance boundary. |
| NIST Zero Trust (SP 800-207) | AC-4 | Zero trust requires decisions at the right boundary, not every derived object. |
| OWASP Non-Human Identity Top 10 | NHI-03 | Derived resource sprawl mirrors the same control drift seen in overgrown machine identity estates. |
Keep policy checks centralized and use local filtering only after authorization is established.
Key terms
- High-Cardinality Authorization: An authorization model becomes high-cardinality when the number of protected objects grows so fast that policy evaluation, cleanup, and audit start to dominate the system. In RAG, derived chunks and embeddings often create this condition, making the control plane harder to operate than the retrieval layer.
- Hierarchical Authorization: Hierarchical authorization assigns access to a stable parent resource such as a workspace, collection, or project, and lets child objects inherit that decision. It reduces policy volume, keeps revocation manageable, and matches how users usually understand access boundaries in enterprise systems.
- Local Metadata Filtering: Local metadata filtering is the practice of enforcing previously approved access scope inside the data layer using fields such as parent IDs. It lets search or vector systems filter at speed without turning every query into a round trip to the authorization service.
- Document-Level Exception: A document-level exception is a direct entitlement granted to a specific item when inheritance is not enough, such as for shared files or compliance holds. It should remain a narrow override path, because making exceptions the default turns governance into a maintenance problem.
Deepen your knowledge
RAG authorization at scale and hierarchical access design are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance for high-cardinality retrieval systems, it is worth exploring.
This post draws on content published by WorkOS: Authorization for RAG at Scale: Why You Shouldn't Sync Every Document. Read the original.
Published by the NHIMG editorial team on 2026-01-08.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org