Signals DAG architecture shows why ML pipelines need explicit dependencies

By NHI Mgmt Group Editorial TeamPublished 2025-07-09Domain: Governance & RiskSource: Abnormal AI

TL;DR: Keeping signal extraction consistent and reducing drift across scoring and aggregation now depends on running a Signals DAG across 3 production systems, including 2 online services at up to 35k QPS and 1 batch job processing 3TB daily, according to Abnormal AI. The governance lesson is that explicit data dependencies matter more than isolated model tuning when detection pipelines scale.

At a glance

What this is: This is a production architecture analysis of how Abnormal AI operationalises a Signals DAG to keep signal extraction, batch aggregation, and realtime scoring aligned at scale.

Why it matters: It matters because identity, access, and security teams increasingly depend on large-scale signal pipelines, and drift, abstraction leakage, and inconsistent data flows can undermine both machine and human governance decisions.

By the numbers:

Abnormal AI says its Signals DAG runs across 3 production systems, including 2 online services at up to 35k QPS and 1 batch job processing 3TB daily.

👉 Read Abnormal AI's Signals DAG architecture analysis for production ML systems

Context

A Signals DAG is a structured way to define how features are derived from inputs so that every dependency is explicit rather than implied. In practice, that matters when detection or risk-scoring systems must stay consistent as data volumes, model complexity, and execution modes increase.

For IAM and security programmes, the relevance is governance as much as engineering. When batch and realtime paths produce different outputs, analysts inherit drift, exceptions, and inconsistent policy signals that are hard to audit or explain.

Key questions

Q: How should security teams prevent drift in large-scale detection pipelines?

A: Security teams should define each derived signal once, then reuse that definition across scoring, aggregation, and reporting paths. The main control is consistency of lineage, not just model accuracy. If two systems compute the same feature differently, decisions become path-dependent and harder to audit, which weakens both detection quality and governance.

Q: Why do separate batch and realtime systems create governance risk?

A: Separate batch and realtime systems create governance risk because the same behaviour can be interpreted through different logic at different times. That creates drift, manual reconciliation, and inconsistent outcomes for analysts and downstream controls. If the pipeline cannot preserve one shared meaning across execution modes, it is difficult to trust the result.

Q: How do you know if a feature pipeline is becoming too complex to trust?

A: A feature pipeline is becoming too complex to trust when engineers must remember hidden dependencies, duplicate logic, or system-specific exceptions to explain results. Those are signs that the architecture is no longer transparent. A good test is whether an outsider can trace a signal from input to output without reading service-specific implementation details.

Q: What is the difference between a shared signal definition and duplicated implementation?

A: A shared signal definition creates one authoritative description of how a feature is derived, while duplicated implementation creates multiple versions that can diverge over time. The first supports consistency and auditability. The second increases maintenance cost and makes scoring outcomes depend on which system processed the data.

Technical breakdown

How a Signals DAG prevents pipeline entanglement

A Signals DAG is a directed acyclic graph in which each signal extraction function declares its inputs and outputs. That declaration turns hidden dependencies into an explicit graph, which makes recomputation, reuse, and change impact easier to reason about. The key operational benefit is that feature creation stops behaving like an implicit chain of side effects. Instead, engineers can trace why a signal exists, what feeds it, and what downstream consumers depend on it. In machine security and identity-adjacent analytics, that matters because brittle pipelines can silently change detection behaviour.

Practical implication: map every derived signal to declared inputs and outputs before scaling a detection pipeline.

Realtime and batch aggregates serve different control windows

Abnormal AI separates realtime signal aggregation from batch signal aggregation because the two modes solve different freshness and cardinality problems. Realtime systems support time-sensitive, low-cardinality features, while batch systems handle broader historical patterns that are too expensive to compute continuously. The design trade-off is speed versus completeness, but the architectural risk is that two implementations can drift apart and produce different views of the same behaviour. Running identical DAG instances across both paths is a way to keep feature logic aligned even when the storage and compute layers differ.

Practical implication: use one shared signal definition layer across realtime and batch paths to reduce drift.

Lambda architecture can reduce leaky abstractions

The post points to a future Lambda architecture because separate batch and realtime systems improved delivery speed but created a leaky abstraction for ML engineers. A leaky abstraction appears when the implementation details of one path become visible to the users of another path, forcing teams to manage differences manually. In this case, researchers and engineers had to think about two execution modes every time they added a feature. Consolidating around a unified pattern can simplify reasoning, improve consistency, and make the system easier to operate at scale.

Practical implication: evaluate whether separate compute paths are creating manual reconciliation work that should be eliminated.

NHI Mgmt Group analysis

Explicit dependency modelling is now a governance control, not just an engineering preference. The Signals DAG approach shows that large detection systems break down when feature lineage is implicit. Once outputs depend on hidden execution order, teams lose the ability to explain changes, compare runs, or trust that scoring and aggregation saw the same inputs. For practitioners, the lesson is that declarative dependency mapping is a prerequisite for auditability in any high-scale security analytics pipeline.

Drift between realtime and batch logic creates a policy problem as much as a data problem. When the same behavioural signal is implemented twice, one path inevitably becomes the reference and the other becomes the exception. That is a governance failure because decisions start depending on which system touched the data, not on the underlying behaviour itself. Practitioners should treat duplicate signal logic as a control-risk indicator, not merely a software-maintenance issue.

Leaky abstractions are how scale turns into operational ambiguity. The article makes clear that speed gains from splitting batch and realtime systems came with hidden coordination cost. As feature volume grows, those costs show up as inconsistent definitions, duplicated logic, and manual reconciliation between teams. The practitioner conclusion is straightforward: if a security model cannot preserve consistency across execution modes, it is not ready for broad operational reliance.

Signals DAG: explicit lineage for derived behavioural features. This concept matters because the business value is not only in better prediction, but in making every derived signal explainable and repeatable. In identity and security programmes, that same discipline helps teams distinguish between model improvement and untracked logic changes. The implication is that governance should extend to feature composition itself, not only to model outputs.

From our research:
Only 44% of organisations are currently using a dedicated secrets management system, according to The 2024 State of Secrets Management Survey.
54% of organisations are dissatisfied with their current secrets management solution because not all secrets are secured, and 43% cite lack of central management.
For a broader governance baseline, read NHI Lifecycle Management Guide for how provisioning, rotation, and offboarding discipline reduce hidden operational drift.

What this signals

Signals DAG governance is a useful model for identity programmes that now depend on derived risk signals. When access decisions depend on multiple upstream inputs, the programme needs explicit lineage or it will inherit drift from every source system. The same logic applies whether the subject is a service account, a workload, or a human entitlement process.

With 54% of organisations dissatisfied with their current secrets management solution because not all secrets are secured, the operational lesson is that fragmented control surfaces create blind spots faster than teams can reconcile them, according to The 2024 State of Secrets Management Survey.

Feature composition becomes a governance boundary at scale: once one path serves realtime decisions and another serves batch analysis, the programme must prove they mean the same thing. For teams aligning to NIST Cybersecurity Framework 2.0, that means treating consistency of derived signals as part of protect and detect, not as a downstream engineering detail.

For practitioners

Inventory every derived signal and its dependencies Document where each aggregate feature comes from, which inputs it consumes, and which scoring or aggregation systems reuse it. If the same signal logic exists in more than one service, treat that as a drift risk and assign an owner for reconciliation.
Standardise signal definitions across execution modes Use a single composition layer so batch and realtime paths do not each evolve their own interpretation of the same behavioural feature. If you keep separate compute layers, make the transformation rules identical and test them against the same fixtures.
Check for leaky abstractions before scaling detection pipelines Look for places where engineers must remember different rules for different systems just to preserve one analytical outcome. That manual coordination burden usually indicates the architecture is hiding complexity instead of removing it.
Align data-store choice to feature shape and throughput Match the storage layer to the access pattern, such as high write throughput for realtime aggregates or fault-tolerant processing for batch features. Avoid forcing one datastore to satisfy every signal type if it introduces performance or correctness trade-offs.

Key takeaways

The article’s central lesson is that detection quality depends on explicit signal lineage, not just stronger models.
Running separate batch and realtime systems can improve delivery speed but also introduces drift, reconciliation work, and inconsistent outcomes.
Practitioners should standardise derived signal definitions across pipelines so scoring, aggregation, and audit views remain aligned.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

NIST CSF 2.0, NIST CSF 2.0 and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	DE.CM-1	Repeated signal drift affects ongoing monitoring quality in security pipelines.
NIST CSF 2.0	PR.DS-1	Signal integrity depends on consistent handling of source data and feature lineage.
NIST CSF 2.0	GV.RM-1	Pipeline drift is a governance risk that affects trust in security decisions.

Treat derived-signal consistency as part of monitoring quality and validate it during change control.

Key terms

Signals DAG: A Signals DAG is a dependency graph that makes feature extraction steps explicit by declaring what each signal consumes and produces. It helps teams reason about data lineage, recomputation, and change impact when multiple systems depend on the same derived behaviour.
Aggregate Feature: An Aggregate Feature is a derived signal built from patterns across multiple events rather than from a single observation. In security and identity analytics, these features often reveal behaviour that is invisible at message or request level, but they also require careful precomputation and governance.
Leaky Abstraction: A leaky abstraction occurs when implementation details from one layer of a system become visible to users of another layer, forcing them to manage hidden complexity manually. In detection pipelines, that usually means separate systems behave differently enough that engineers must keep reconciling their outputs.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Abnormal AI: Signals DAG architecture and production scaling in the detection engine. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-07-09.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org