What do identity teams get wrong about AI-based risk scoring?

Why This Matters for Security Teams

AI-based risk scoring is often positioned as a way to spot unusual access, but identity teams frequently overtrust the score and underdesign the context behind it. A score built only from behaviour can miss lifecycle events, job changes, service-account transitions, or delegated workflows, which means it may flag ordinary activity as suspicious while missing actual privilege expansion. NIST’s Cybersecurity Framework 2.0 emphasises governance and continuous risk management, not isolated telemetry.

That matters because identity systems are now expected to evaluate risk across humans, NHIs, and agentic workflows, where access changes faster than static rules can be refreshed. NHIMG’s Ultimate Guide to NHIs and Top 10 NHI Issues show that identity failures usually begin when context is missing, not when detection is absent. In practice, many security teams encounter false confidence in scoring only after access drift has already been normalised by the business.

How It Works in Practice

Effective AI-based risk scoring is not a single model output. It is a decision process that combines behavioural signals with lifecycle state, policy context, entitlement history, and recent change events. For example, a login from a new location may be low risk if it follows an approved role transfer, but high risk if it appears alongside unusual privilege requests or token creation. This is why scoring should be tied to identity governance, not treated as a standalone detector.

Practitioners usually improve results by layering four inputs:

Identity lifecycle state: onboarding, active, dormant, terminated, or delegated.

Workflow context: ticket approval, change request, break-glass event, or automation job.

Entitlement drift: recent role changes, privilege escalation, and inherited access.

Entity type: employee, contractor, service account, workload identity, or agent.

The operational goal is to make the model interpret intent, not just anomaly. That is consistent with broader guidance in the NIST Cybersecurity Framework 2.0, where decisions should reflect governance and risk treatment. For NHI-heavy environments, the relevant lesson is reinforced by NHIMG’s 52 NHI Breaches Analysis: access misuse is easiest to miss when telemetry is evaluated without ownership, purpose, and expiry. A score becomes useful when it explains why the event matters and what control should act on it next. These controls tend to break down when the environment mixes human identities, service principals, and autonomous agents in the same policy plane because their normal behaviour patterns are not comparable.

Common Variations and Edge Cases

Tighter scoring often increases operational overhead, requiring organisations to balance detection quality against review fatigue and integration complexity. That tradeoff is especially visible when teams try to apply one model across very different identity classes. Current guidance suggests separating scoring logic by identity type, but there is no universal standard for this yet.

One common edge case is a legitimate burst of access after role change, incident response, or automation deployment. Another is a service account that looks quiet for weeks and then executes a high-volume task in seconds. Both can appear anomalous if lifecycle and workflow state are absent. For that reason, AI scores should be treated as decision support, not proof of malicious intent.

NHIMG’s Ultimate Guide to NHIs — Key Challenges and Risks and LLMjacking: How Attackers Hijack AI Using Compromised NHIs show why this gets harder as automated systems gain more tool access and token mobility. The practical boundary is simple: when the platform cannot distinguish an approved transition from an unexpected privilege expansion, the score stops being actionable and starts generating noise.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	GV.OV-03	Risk scoring needs governance and oversight, not isolated anomaly output.
NIST AI RMF		AI RMF addresses context-aware risk management for model-driven decisions.
OWASP Agentic AI Top 10	A2	Autonomous identities need runtime context, not static scoring alone.

Evaluate agent behaviour with runtime policy and lifecycle context before granting or revoking access.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do identity teams get wrong about AI-based risk scoring?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group