Security teams should start with CVSS, then adjust for agent behaviour that can amplify harm at runtime. The key question is not only how severe the flaw is, but how much worse it becomes when the system can choose tools, coordinate actions, or trigger downstream workflows without human approval.
Why This Matters for Security Teams
agentic ai changes vulnerability scoring because the same weakness can produce very different outcomes once an autonomous system can select tools, chain actions, or trigger downstream workflows without human approval. Traditional severity scoring still matters, but it is only a starting point. Security teams need to understand execution authority, data reach, and whether the agent can turn a low-level flaw into unauthorised access or business process abuse.
This is why practitioner guidance increasingly points to agent-specific risk models such as the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework, rather than relying on CVSS alone. NHIMG research also shows why this matters operationally: in the AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already acted beyond intended scope. In practice, many security teams encounter the true impact only after an agent has already chained the flaw into a real workflow, rather than during the original vulnerability review.
How It Works in Practice
The most reliable approach is to score agentic AI vulnerabilities in layers. Start with a baseline severity rating for the technical weakness, then add an agent-impact adjustment that reflects what the system can actually do at runtime. A prompt injection, tool misuse condition, exposed token, or unsafe function call becomes more serious when the agent has broad tool access, can persist state, or can reach sensitive systems.
Security teams usually evaluate four questions:
- Can the flaw alter the agent’s decisions, tool selection, or execution path?
- What privileged actions can the agent take without human review?
- Can the agent exfiltrate secrets, data, or credentials into other systems?
- Does the agent have the ability to automate lateral movement or trigger irreversible business actions?
That is why threat modelling for agents should combine CVSS with context from frameworks like CSA MAESTRO agentic AI threat modeling framework and the MITRE ATLAS adversarial AI threat matrix. For example, an issue that only leaks a prompt fragment may be low severity in a static chatbot, but higher severity in an agent that can call APIs, read email, and update tickets. NHIMG’s OWASP NHI Top 10 research is useful here because it connects identity, secrets, and agent behaviour into one risk surface.
Operationally, many teams create a two-part score: technical severity plus agent amplification. That score should be reviewed alongside execution logs, approved tools, credential lifetime, and whether the agent can act independently. These controls tend to break down when the environment allows uncontrolled tool chaining across SaaS apps, because the agent’s runtime context changes faster than the original vulnerability ticket.
Common Variations and Edge Cases
Tighter agent risk scoring often increases review overhead, requiring organisations to balance precision against delivery speed. That tradeoff is real, especially when multiple agents share tools or when product teams want a simple severity number for triage.
There is no universal standard for this yet, so current guidance suggests treating agent score inflation as a contextual overlay rather than a replacement for CVSS. A vulnerability should score higher when it affects an agent with broader permissions, long-lived secrets, or access to workflows that can move money, modify records, or expose sensitive data. It may score lower when the same issue is trapped inside a heavily constrained sandbox with short-lived credentials and no external side effects.
Edge cases often appear in multi-agent systems. One agent may look harmless in isolation, but the risk rises if it can pass instructions or data to another agent with stronger permissions. The same is true for systems that use retrieval, plugins, or delegated action queues. In those environments, best practice is evolving toward runtime policy evaluation rather than static severity alone. That aligns with the NIST AI Risk Management Framework and NHIMG’s broader agent-risk guidance, but teams should treat the method as mature practice in development, not settled consensus.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A1 | Agentic flaws escalate when tool use and autonomy expand impact. |
| CSA MAESTRO | MAESTRO frames how agent workflows and trust boundaries change risk. | |
| NIST AI RMF | GOVERN | AI RMF supports governance for contextual risk decisions on agents. |
Use governance and mapping functions to document when agent context inflates severity.
Related resources from NHI Mgmt Group
- How should security teams govern machine identity credentials in agentic AI environments?
- How should security teams limit the risk from AI agents that have access to production systems?
- How should security teams govern AI agents that can access enterprise systems?
- How should security teams red team non-deterministic AI systems?
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on July 5, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org