Because the same authentication event can be routine for one identity and suspicious for another. Baselines let teams compare access patterns against the identity’s own history, so anomalies are judged in context instead of by generic thresholds. That reduces alert noise and improves investigation quality.
Why This Matters for Security Teams
Behavioural baselines matter because non-human identities do not all behave the same way, even when they use the same protocol or touch the same application. A service account that runs hourly backups, a CI/CD token that only deploys on release branches, and an API key used by a partner integration each have different normal patterns. Without identity-specific baselines, defenders end up comparing unlike events and either miss real anomalies or drown in false positives.
This is especially important because NHIs now outnumber human identities by 25x to 50x in modern enterprises, which makes manual judgment impossible at scale. NHI governance guidance from NHI Mgmt Group shows why visibility and context are foundational, not optional. Behavioural baselines provide that context by tying activity to the identity’s own history, not a generic rule set. That aligns with the intent of NIST Cybersecurity Framework 2.0, which emphasises risk-informed monitoring and response rather than one-size-fits-all thresholds.
In practice, many security teams only discover the need for baselines after an identity has already been abused to make normal-looking requests at unusual times or from unusual systems.
How It Works in Practice
Effective baselines start by defining what “normal” means for each NHI, not for the environment as a whole. For a workload identity, that may include source host, API endpoints, request rate, time windows, token lifetime, and the set of downstream services it can reach. For an AI agent, the baseline must also reflect goal-driven behaviour, tool chaining, and which actions are expected during a task. Static RBAC alone cannot capture this, which is why current guidance suggests combining behavioural baselines with runtime authorisation and short-lived credentials.
Teams usually pair baselining with NIST Cybersecurity Framework 2.0 monitoring objectives and implementation patterns such as just-in-time access, ephemeral secrets, and workload identity. In mature environments, the baseline informs policy decisions at runtime: if a token suddenly calls a new payment API, reaches an unfamiliar region, or attempts privilege escalation, the decision engine can step up verification, reduce scope, or revoke the session. This is where behavioural data becomes operational security rather than passive analytics. It also supports investigations because analysts can ask whether the event was merely rare or genuinely inconsistent with the identity’s prior behaviour.
NHIMG research on the JetBrains GitHub plugin token exposure illustrates the practical value of context. A leaked secret is dangerous on its own, but anomalous use after exposure is what often confirms abuse and drives containment. That same logic applies across service accounts, CI/CD tokens, and agent credentials. These controls tend to break down when organisations share one identity across multiple tools, because the resulting activity becomes too noisy to baseline reliably.
Common Variations and Edge Cases
Tighter baselining often increases operational overhead, requiring organisations to balance detection quality against maintenance cost. That tradeoff is real: the more dynamic the workload, the more frequently the baseline must be updated. There is no universal standard for this yet, especially for autonomous agents and multi-agent workflows, where behaviour may legitimately vary based on goals, prompts, and tool availability.
One common variation is to baseline at the workload layer instead of the credential layer. This works well when the same secret is reused by multiple services, but it is weaker than identity-specific baselining because it can hide lateral movement. Another variation is to baseline “intent” rather than exact commands for AI agents, which is an emerging practice rather than settled consensus. The practical goal is to decide whether the agent is doing something consistent with its mission, not merely whether the request matches yesterday’s sequence.
For high-risk environments, teams should combine baselines with NIST Cybersecurity Framework 2.0 response playbooks and identity telemetry from JetBrains GitHub plugin token exposure-style incidents to tune thresholds realistically. Baselines also age quickly in CI/CD, ephemeral compute, and agentic systems that spin up, complete work, and disappear within minutes. In those environments, manual tuning rarely keeps pace with the identity lifecycle.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Non-Human Identity Top 10 | NHI-05 | Behavioural baselines support anomaly detection for non-human identities. |
| CSA MAESTRO | GOV-2 | Agent governance depends on monitoring goal-driven behaviour at runtime. |
| NIST AI RMF | AI RMF addresses monitoring and risk treatment for autonomous systems. |
Use AI RMF governance and mapping functions to document expected behaviour and response thresholds.