Why OWASP’s AIVSS changes risk scoring for agentic AI

By NHI Mgmt Group Editorial TeamPublished 2025-09-03Domain: Agentic AI & NHIsSource: Lakera

TL;DR: CVSS alone does not capture how agentic systems amplify risk once autonomy, tool use, and multi-agent coordination are involved, according to Lakera’s analysis of OWASP’s AIVSS. The practical shift is to score vulnerability severity in the context of agent behaviour, not just code weakness, because access, memory, and runtime action can turn moderate issues into material exposure.

At a glance

What this is: Lakera argues that OWASP’s AIVSS extends CVSS so agentic AI risk can be scored with autonomy, memory, and multi-agent behaviour in view.

Why it matters: That matters because IAM, PAM, and NHI teams need a scoring model that reflects how autonomous systems change blast radius, not just how vulnerable the underlying software looks.

👉 Read Lakera's opinion on why OWASP needs AIVSS for agentic AI

Context

CVSS was designed to describe software vulnerability severity, but agentic AI changes the unit of analysis. Once a system can choose tools, combine actions, and operate with runtime autonomy, the old assumption that a score on paper maps cleanly to real-world risk starts to fail. The question becomes how to measure amplification, not only defect severity.

For identity and access teams, this is not an abstract scoring debate. Agentic systems can widen access scope, create cascading failure paths, and make traditional review and remediation cadences less informative. That is why the article’s core argument belongs in NHI governance, autonomous identity controls, and the broader IAM conversation, including the boundaries between policy, privilege, and execution.

If you are mapping this topic to frameworks, the most relevant reference point is the OWASP work on agentic applications, alongside NIST’s AI risk guidance for governance and accountability. The post is about how security teams should interpret risk when the actor is no longer just software, but a runtime decision-maker with tools attached.

Key questions

Q: How should security teams score vulnerabilities in agentic AI systems?

A: Security teams should start with CVSS, then adjust for agent behaviour that can amplify harm at runtime. The key question is not only how severe the flaw is, but how much worse it becomes when the system can choose tools, coordinate actions, or trigger downstream workflows without human approval.

Q: Why do agentic systems complicate traditional vulnerability prioritisation?

A: They complicate prioritisation because the same defect can produce very different outcomes depending on what the agent can access and how independently it can act. A moderate issue may be low concern in a static app but high concern when an agent can chain tools, persist state, or affect multiple systems.

Q: What do security teams get wrong about AI agent risk scoring?

A: Teams often score the software flaw and stop there. That misses the runtime question of whether the agent can amplify the defect through autonomy, memory, or delegation. The mistake is treating AI risk as a code issue only, when it is also an access and execution problem.

Q: Who should own governance for agentic AI vulnerability scoring?

A: Ownership should sit jointly with security, IAM, and the teams operating the agentic workflow. If one group owns the score but another owns the privileges, the programme will miss the actual failure path. Accountability has to follow the access and the execution model, not the org chart alone.

Technical breakdown

Why CVSS understates agentic AI risk

CVSS was built to score vulnerabilities in software, not to account for systems that can choose actions after exploitation. In agentic environments, the same flaw can be amplified by memory, tool access, and coordination across multiple agents, which changes both severity and blast radius. A moderate issue can become operationally serious if an agent can chain actions without human review. That is the gap AIVSS tries to close by adding agent-specific amplification rather than replacing the baseline severity model.

Practical implication: Practitioners should treat CVSS as a starting point, not the final prioritisation signal, for any system that can act at runtime.

How AARS and ThM change prioritisation

AIVSS layers an Agentic AI Risk Score on top of CVSS and then adjusts for live threat conditions through a Threat Multiplier. The point is not to invent a separate universe of scoring, but to reflect two realities at once: whether the system architecture can magnify harm, and whether an exploit is active in the wild. That combination is closer to how real defenders think about urgency. It makes the score sensitive to runtime behaviour, not just static vulnerability description.

Practical implication: Use the agentic amplification factors and current exploit context together when deciding what gets immediate attention.

Agentic AI tool misuse and cascading failures

The article highlights two failure modes that matter in production. Tool misuse occurs when an agent uses an external system in a harmful or unintended way. Cascading failure occurs when one bad action propagates across connected systems, especially in multi-agent setups. These are identity problems as much as application problems, because the harm often depends on what the agent can reach, what it can invoke, and how far its permissions travel across a chain of delegation.

Practical implication: Review tool grants, delegation paths, and cross-system access together instead of treating each agent in isolation.

Threat narrative

Attacker objective: The attacker seeks to turn a normal software weakness into amplified runtime harm by using agent behaviour to extend reach and impact.

Entry begins when an attacker identifies a vulnerable agentic application or weakly governed AI workflow with tool access and runtime privileges.
Escalation occurs when the agent's autonomy, memory, or multi-agent interactions amplify the original flaw into broader misuse, unexpected actions, or chained execution.
Impact follows when the agent reaches external tools or connected systems, causing cascading failures, untraceable activity, or harmful actions at production scale.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AIVSS is a recognition that vulnerability severity is no longer enough once software can act like an identity. CVSS assumes a defect sits inside a bounded system. Agentic AI breaks that assumption because the runtime actor can select tools, combine steps, and amplify harm beyond the original flaw. The implication is that risk scoring must account for behaviour, not only weakness.

Agentic AI creates a new kind of identity blast radius. The issue is not just whether an agent is compromised, but how far its permissions, memory, and delegated actions can extend before anyone notices. That is a governance problem for NHI and IAM teams because access scope, not code alone, determines how much damage the agent can do. Practitioners should treat runtime reach as a first-class control boundary.

Agentic AI tool misuse is the named failure mode this framework helps surface. A flaw becomes materially worse when an agent can invoke tools in unintended ways, especially across multiple systems. This is where policy-based controls and review cycles lose precision if they only measure static entitlements. The implication is that security teams need to re-evaluate how they model agent authority before they can score agent risk credibly.

Multi-agent coordination turns single-point failures into chain reactions. Once one agent can trigger another, a modest issue can propagate through a workflow, creating cascading failures that traditional application scoring does not express well. That matters because accountability becomes distributed across components that were never designed to be assessed as a single identity system. Practitioners should expect scoring to follow delegation, not just software boundaries.

OWASP’s agentic work is pushing the market toward runtime governance, not just static assessment. That shift aligns with how identity teams already think about workload identity, privilege scope, and lifecycle control in NHI programmes. The practical conclusion is that AI risk scoring has to sit beside access governance, not outside it.

From our research:
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to AI Agents: The New Attack Surface.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation, according to AI Agents: The New Attack Surface.
For broader context on runtime agent risk, see OWASP Agentic AI Top 10 for the control areas most likely to fail in production.

What this signals

Identity teams should expect agentic risk scoring to move closer to access governance. When a system can act, the score is no longer just about code weakness. It becomes a question of delegated reach, tool authority, and whether the organisation can explain what an agent was allowed to do at runtime.

Identity blast radius: this is the practical unit that AIVSS pushes teams toward. If an agent can invoke multiple tools, the real risk is not the flaw alone but the span of systems it can touch before containment happens. That makes lifecycle ownership, access review, and exception handling part of the scoring conversation, not separate activities.

With 52% of companies able to track and audit the data their AI agents access, per AI Agents: The New Attack Surface, the governance gap is already operational. Teams should prepare for agent risk to show up in audit, legal, and incident response workflows, not just in model security reviews.

For practitioners

Map agentic workflows to real privilege boundaries List every tool, data source, and downstream system an agent can reach, then document where runtime choices can expand beyond the original task scope. Focus on the access a tool call creates, not only the prompt or application layer.
Score runtime amplification alongside base vulnerability severity Use CVSS as the starting point, then add agent-specific factors such as autonomy, memory, multi-agent interaction, and live exploit context when prioritising remediation. This helps separate theoretical bugs from flaws that can become operationally severe.
Constrain tool use and delegation paths Require explicit approval boundaries for high-impact tools, and review whether one agent can trigger another without a control gate. Where possible, reduce cross-system reach so a single failure cannot cascade through connected services.
Fold agent risk into identity governance reviews Include autonomous workflows in access recertification, exception handling, and ownership assignments so the programme can answer who is accountable when an agent acts outside its intended scope.

Key takeaways

Agentic AI breaks the assumptions behind CVSS because runtime behaviour can magnify a flaw far beyond its static severity.
OWASP’s AIVSS matters because it brings autonomy, tool misuse, and live exploit context into the prioritisation model.
Security teams should score agentic risk through access scope, delegation paths, and blast radius, not software weakness alone.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	AIVSS addresses agentic tool misuse and runtime amplification risk.
NIST AI RMF		The article centres on governance for AI risk, accountability, and impact assessment.
NIST Zero Trust (SP 800-207)	PR.AC-4	Agent tool access is an access-control problem, not only an app-security problem.

Treat agent tool permissions as segmented access boundaries with explicit verification and least privilege.

Key terms

Agentic AI Risk Score: A score that adjusts ordinary vulnerability severity for how an AI agent can magnify harm at runtime. It accounts for autonomy, memory, delegation, and multi-agent behaviour so practitioners can judge the likely operational impact, not just the defect on paper.
Threat Multiplier: A scoring modifier that raises urgency when an exploit is active, widely circulating, or likely to be used in the wild. In agentic settings, it helps separate theoretical weaknesses from issues that are already being exploited against systems with runtime authority.
Identity Blast Radius: The total reach an identity can exert across tools, data, and connected systems before containment occurs. For autonomous or agentic actors, the blast radius is shaped less by the bug itself than by how far delegated access and tool authority extend.
Tool Misuse: A failure mode where a system uses an external tool in a harmful, unintended, or policy-breaking way. In agentic environments, the risk comes from runtime action, not just from the existence of the tool, which makes permission scope and approval gates central controls.

What's in the full article

Lakera's full opinion piece covers the technical rationale this post intentionally leaves at a higher level:

AIVSS scoring mechanics, including how CVSS and the agentic amplification factors are combined
Lakera's explanation of autonomy, memory, and multi-agent interaction as risk multipliers
Examples of agentic AI tool misuse and cascading failure patterns from red team practice
The article's framing of why CVSS-style scoring remains familiar while becoming incomplete for agents

👉 Lakera's full article explains the AIVSS scoring model and its agentic failure categories in more detail

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org