They often measure the number of vulnerabilities found instead of the speed and consistency of remediation. High finding counts can reflect better detection, not better security. The more useful signals are time to remediate, closure rates, and whether teams are fixing issues in the workflow that produced them.
Why This Matters for Security Teams
AppSec metrics shape funding, prioritisation, and executive confidence, so the wrong metric can reward the wrong behaviour. Counting vulnerabilities is easy, but it confuses detection volume with actual risk reduction. Security teams need to know whether issues are being fixed quickly, whether fixes stick, and whether the same defect patterns keep returning. NIST’s Cybersecurity Framework 2.0 pushes measurement toward outcomes, not activity, which is the right instinct for application security as well.
The problem is that a high finding count often reflects better coverage, more aggressive scanning, or broader code review, not weaker security. That is why raw counts can look “worse” even as the programme improves. NHIMG’s Ultimate Guide to NHIs makes the same broader point for identity programmes: volume alone is a poor proxy for control maturity. The same logic applies in AppSec, where teams need to track closure rates, remediation age, and recurrence patterns rather than celebrate or panic over totals.
In practice, many security teams discover their metric model is broken only after the backlog has grown and engineering has stopped trusting the dashboard.
How It Works in Practice
Useful AppSec measurement starts with separating detection from remediation. Scanners, SAST, DAST, dependency tools, and code review may all increase findings at the same time, but security posture only improves when those findings are turned into fixed code, configuration changes, or compensating controls. Current guidance suggests building a metric set that answers four operational questions: how fast issues are fixed, how many are fixed within service-level targets, whether critical findings are recurring, and whether remediation happens in the workflow that created the issue.
That usually means tracking:
- Time to remediate by severity and repository
- Closure rate for findings created in the last sprint or release window
- Reopen rate or recurrence rate for the same control failure
- Age of the oldest unresolved critical issues
Better teams also segment by issue class. A leaked secret, insecure deserialisation issue, and dependency vulnerability do not move through the same workflow, so one blended average hides the real bottleneck. NHIMG’s State of Secrets in AppSec shows why this matters: the average estimated time to remediate a leaked secret is 27 days, which is a reminder that detection without fast closure is weak control. Tie that operationally to the NIST Cybersecurity Framework 2.0 by measuring outcomes that reflect reduced exposure, not just activity.
Metrics should also be difficult to game. If teams are rewarded only for closure volume, they may close low-risk items first and defer systemic fixes. If they are rewarded only for low counts, they may reduce scanning coverage. These controls tend to break down when multiple product teams use different severity models because the numbers stop being comparable and leaders start optimising the dashboard instead of the risk.
Common Variations and Edge Cases
Tighter AppSec measurement often increases reporting overhead, requiring organisations to balance visibility against developer friction. That tradeoff is real, especially when release cycles are short or engineering teams already face alert fatigue. Best practice is evolving, but there is no universal standard for one perfect AppSec scorecard yet.
Some teams use weighted severity scores, while others prefer operational SLAs by class of issue. The former is easier to summarise for executives, but it can mask whether critical items are actually being fixed. The latter is more actionable, but it requires consistent taxonomy and strong triage discipline. Both approaches work better when paired with workflow metrics, such as whether issues are fixed in pull requests, within the CI pipeline, or after release through incident response.
One common edge case is “good” backlog growth. When coverage expands, the number of findings may rise before remediation catches up. That is not failure if closure speed is improving and the oldest issues are disappearing. Another edge case is vendor-generated noise from dependency scanning, where counts can fluctuate based on advisory churn rather than code risk. In those environments, the better question is not “how many findings?” but “how quickly does the programme convert new findings into durable fixes?”
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| NIST CSF 2.0 | GV.ME | Measurement should focus on outcomes, not raw activity counts. |
| OWASP Non-Human Identity Top 10 | NHI-03 | Secrets and identity issues need closure metrics, not discovery counts. |
| NIST AI RMF | MEASURE | AI RMF measurement emphasises observed performance and risk reduction. |
Track remediation speed and closure quality as governance metrics, not just vulnerability volume.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 10, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org