By NHI Mgmt Group Editorial TeamPublished 2026-04-13Domain: Best PracticesSource: Orca Security

TL;DR: Anthropic says Project Glasswing, built around Claude Mythos Preview, found thousands of previously unknown zero-day vulnerabilities across major operating systems and browsers, including an OpenBSD bug reportedly missed for 27 years, according to Anthropic. Continuous AI-assisted testing is becoming a practical way to shift security earlier in the development lifecycle, but judgment and context still decide what gets fixed first.


At a glance

What this is: This is an analysis of AI-driven security testing in the development lifecycle, and its key finding is that earlier automated vulnerability discovery can reduce noise and speed safer delivery.

Why it matters: It matters because IAM, NHI, and platform security teams increasingly need to govern development-time identities, tool access, and remediation workflows as part of the same control plane.

👉 Read Orca Security's analysis of AI-driven security testing in the development lifecycle


Context

AI-driven security testing is the use of automated or semi-automated analysis to find weaknesses earlier in the software development lifecycle. The governance gap is that many organisations still depend on late-stage testing, when fixes are slower and operational disruption is higher, even though engineering teams are shipping under constant pressure.

The article’s core claim is that continuous investigation can make secure delivery more practical, not just more aspirational. For IAM and security teams, that raises a broader identity question: which human, workload, and tool identities are authorised to test, trigger, and act on findings inside the development lifecycle?


Key questions

Q: How should security teams use AI-driven testing in the development lifecycle?

A: Security teams should place AI-driven testing inside normal development workflows so findings arrive before production, not after release. The useful pattern is continuous validation during design, build, and pre-release review, paired with human judgment for prioritisation. That reduces late-cycle noise and makes remediation cheaper because the code context is still fresh.

Q: Why does earlier vulnerability discovery matter for release risk?

A: Earlier discovery matters because the cost and disruption of fixing a weakness rise as software moves closer to production. Finding issues upstream reduces urgent escalations, lowers false-positive chasing, and gives engineering teams a cleaner signal. It also shortens the distance between detection and remediation, which improves delivery confidence.

Q: What do organisations get wrong about AI-assisted security testing?

A: A common mistake is treating better detection as a substitute for governance. AI may find more issues, but teams still need clear rules for identity access, remediation authority, and prioritisation. Without that structure, organisations can generate more findings without improving real risk decisions.

Q: How do organisations know if continuous security testing is actually working?

A: It is working when more issues are found before release, fewer emergency fixes reach production, and engineering spends less time on avoidable investigations. The best signal is not the number of findings alone, but whether the programme is reducing downstream noise and shortening remediation paths.


Technical breakdown

How AI-assisted code testing changes vulnerability discovery

AI-assisted testing shifts vulnerability discovery from point-in-time review to repeated analysis during design, implementation, and release preparation. Instead of waiting for a scheduled penetration test, the system can inspect code paths, identify exploitable patterns, and surface weaknesses while change context is still fresh. That does not remove human review. It changes the economics of finding issues earlier, when remediation is usually simpler and less disruptive. The practical effect is not just better detection. It is a tighter feedback loop between engineering activity and security validation.

Practical implication: move security testing into the normal development workflow rather than reserving it for late-stage assurance.

Why continuous testing reduces production noise

Production noise grows when weaknesses escape the SDLC and appear as urgent incidents, repetitive false positives, or emergency remediation tasks. Continuous testing changes the signal by catching more issues before deployment, which means fewer downstream investigations and less interruption to engineering and security teams. This matters because the hard part of security is often not finding one bug, but prioritising the right bug at the right time. Earlier discovery lowers blast radius and makes triage more actionable.

Practical implication: treat earlier discovery as a control for reducing alert fatigue and remediation backlog, not just as a code-quality improvement.

Where development-time identity governance still matters

AI-powered testing only works safely when the surrounding identities are controlled. Development pipelines rely on service accounts, API keys, tokens, and tool integrations that can themselves become the path to misuse if over-privileged or loosely governed. The more an organisation embeds analysis into daily workflows, the more it needs clear limits on which identities can invoke which tools, what they can modify, and how their actions are audited. In practice, the security value of continuous testing depends on the trust model around the pipeline itself.

Practical implication: govern the identities behind testing, scanning, and remediation as tightly as the applications being tested.


Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

AI-driven security testing is becoming a control-plane problem, not just an AppSec problem. The article frames the upside as better code review and earlier vulnerability discovery, but the deeper implication is that security testing is now part of the operational identity layer of software delivery. Once tools can inspect, triage, and trigger actions continuously, teams must decide which identities are allowed to initiate that work and under what authority. Practitioners should treat the testing stack as governed infrastructure, not a collection of helper scripts.

Continuous validation reduces late-cycle noise because it shifts decision-making upstream. Point-in-time red teaming and pen testing still have value, but they often arrive after architecture choices are fixed and remediation cost has risen. Earlier detection changes the economics of security by reducing the number of findings that reach production and by giving engineering teams a cleaner signal. The practical conclusion is that security validation should be measured by how much it prevents from ever entering release candidate state.

Development lifecycle security now depends on the same trust assumptions that govern non-human identities. Test runners, scanners, remediation bots, and CI integrations all depend on secrets, tokens, and delegated access. If those identities are not bounded, continuously reviewed, and scoped to purpose, the benefit of earlier vulnerability discovery can be offset by new exposure paths. Practitioners should align SDLC automation with NHI governance rather than treating pipeline access as a separate discipline.

Tooling that finds more issues will not replace judgment, and that is the real governance boundary. The article correctly points out that finding a weakness is not the same as knowing whether it matters, what depends on it, or what should be fixed first. That means security programmes need decision criteria, not just better detection. The implication for practitioners is that AI-assisted testing should feed prioritisation models, not bypass them.

Secure development is moving toward continuous assurance, but only if organisations absorb the operating model change. The market signal is not that one more scanner is enough. It is that engineering organisations will increasingly need an always-on security validation layer embedded into release pipelines, review gates, and remediation workflows. Practitioners should prepare for security controls that are more integrated, more frequent, and more identity-dependent than traditional review cycles.

From our research:

What this signals

Identity sprawl inside development tooling is now part of secure delivery risk. As AI-driven testing becomes more common, the identities that power scanning, analysis, and remediation will matter as much as the code they inspect. Teams should be watching for over-privileged service accounts and excessive token reuse across build and testing systems, because those paths can undermine the security gains of earlier discovery. The right benchmark is whether test-time access is scoped and auditable enough to survive internal review, not just whether the tool returns findings.

Continuous assurance will force security teams to connect AppSec and NHI governance. When scanning, triage, and remediation become embedded in day-to-day delivery, the surrounding identities stop being background plumbing and become a control surface. That means access review, secret handling, and offboarding discipline now influence whether the security programme can scale safely. The broader signal is that software delivery teams will need governance patterns that treat pipeline identities as first-class assets, not incidental implementation detail.


For practitioners

  • Embed security testing into delivery workflows Run vulnerability analysis during design, implementation, and release preparation instead of waiting for the final checkpoint. Make the workflow repeatable so engineers can use it without special escalation.
  • Govern the identities behind testing tools Inventory the service accounts, tokens, and API keys used by scanners, analysis agents, and remediation integrations. Scope each identity to the narrowest tool and environment set it needs.
  • Separate detection from decision authority Allow automated discovery to surface findings quickly, but require explicit policy for what can be auto-remediated, what needs review, and what must stop the pipeline.

Key takeaways

  • AI-driven security testing is useful because it moves vulnerability discovery earlier in the lifecycle, where fixes are cheaper and operational fallout is lower.
  • The evidence point is not just more findings, but fewer issues escaping into production, which reduces noise for both engineering and security teams.
  • The practical boundary is identity governance: the tools that test and remediate code must be bounded, audited, and scoped as tightly as the systems they protect.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Non-Human Identity Top 10NHI-01Development tooling depends on secrets and tokens that must be governed as NHIs.
NIST CSF 2.0PR.AC-4Continuous testing needs access control and auditability across build-time identities.
NIST Zero Trust (SP 800-207)AC-6Zero trust principles apply to tool identities that act inside the SDLC.

Map testing, scanning, and remediation identities to least-privilege access and review them regularly.


Key terms

  • AI-driven security testing: AI-driven security testing uses automated analysis to discover vulnerabilities earlier in the software lifecycle. The practical value is not just speed, but earlier feedback while code, ownership, and context are still visible. It becomes useful when teams can repeat it continuously without creating extra operational friction.
  • Development lifecycle security: Development lifecycle security is the discipline of applying security controls from design through release, rather than waiting for production. It spans review, testing, identity governance, and remediation workflows. The key measure is whether security feedback arrives early enough to change engineering decisions before risk hardens.
  • Pipeline identity: Pipeline identity is the non-human identity used by build, test, scan, and deployment systems to access tools and resources. These identities often carry more reach than teams realise, because they can touch code, secrets, and infrastructure. Governance requires scoping, rotation, and audit just like any other NHI.
  • Continuous assurance: Continuous assurance is the practice of validating security conditions repeatedly instead of relying on a single review point. In software delivery, it means ongoing testing, monitoring, and decision support. Its value depends on whether the surrounding identities, permissions, and workflows are governed tightly enough to trust the output.

Deepen your knowledge

AI-driven security testing in the development lifecycle is covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building governance around CI/CD identities, secrets, and remediation workflows, it is worth exploring.

This post draws on content published by Orca Security: Why AI-driven security testing in the development lifecycle could help teams reduce noise, deploy faster, and build safer software. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-13.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org