TL;DR: Rising latency, scaling limits, and fragmented analytical workflows across MySQL, MongoDB, streaming, and search-driven data pipelines are what Josys describes as prompting its move from single-node aggregation services to a Spark-based IDAC layer. The core lesson is that identity and data governance both fail when trust, consistency, and scale are treated as afterthoughts.
NHIMG editorial — based on content published by Josys: Data Engineering at Josys
Questions worth separating out
Q: How should teams design analytics pipelines that can grow without creating bottlenecks?
A: Use distributed compute, clear processing layers, and standard data contracts so workload growth does not concentrate on one service.
Q: Why do layered data architectures improve governance as well as performance?
A: Layered architectures improve governance because they make data transformation visible and reviewable at each stage.
Q: What breaks when organisations rely on a single analytics service for every workload?
A: A single analytics service eventually becomes a bottleneck for compute, writes, and downstream reporting.
Practitioner guidance
- Map pipeline control points to governance zones Separate raw ingestion, cleansing, and reporting stages so each layer has a defined owner, validation rule, and audit trail.
- Replace single-node analytics dependencies with distributed compute Move heavy aggregation workloads off a single service when latency rises under growth.
- Standardise shared data contracts for reporting consumers Define common schemas and transformation logic for dashboards, reports, and operational analytics.
What's in the full article
Josys' full article covers the implementation detail this post intentionally leaves for the source:
- The Node.js and MongoDB aggregation pattern that preceded the Spark-based architecture.
- The layered IDAC structure and how Bronze, Silver, and Gold roles are divided in practice.
- The ingestion options Josys uses, including CDC, streaming, and custom functions.
- The platform outcomes Josys says it achieved for customer dashboards and reporting features.
👉 Read Josys' article on building its distributed IDAC data engineering framework →
Distributed analytics foundations: what Josys' IDAC means for teams?
Explore further
Distributed analytics is now a governance problem, not just an engineering one. Josys describes the move from single-node aggregation to a layered, Spark-backed framework because the earlier model could not keep up with growing load. That is the same pattern identity teams face when access, reporting, and assurance logic are spread across disconnected systems. The lesson is that scale failures show up first as latency, then as inconsistency, and finally as trust erosion. Practitioners should treat analytics architecture as part of governance architecture.
A few things that frame the scale:
- 70% of organisations grant AI systems more access than they would give a human employee performing the exact same job, according to The 2026 Infrastructure Identity Survey.
- Only 13% of organisations feel extremely prepared for the reality of agentic AI despite the majority racing toward autonomous adoption.
A question worth separating out:
Q: How do identity and security teams apply the same lessons to governance data?
A: They should use the same design discipline for access and assurance data that data engineers use for analytics. That means normalised inputs, clear ownership, traceable transformations, and reliable reporting layers. If identity evidence is fragmented, reviews and decisions will be inconsistent no matter how strong the policy language looks on paper.
👉 Read our full editorial: Josys data engineering shifts to a distributed analytics foundation