AI code security now hinges on reachability, not severity

By NHI Mgmt Group Editorial TeamPublished 2026-06-24Domain: Best PracticesSource: Orca Security

TL;DR: AI-generated code introduces insecure-by-default patterns, hallucinated dependencies, and faster secret exposure, while 5.2% of commercial-model outputs and 21.7% of open-source-model outputs contained hallucinated packages in one USENIX study, according to Orca Security. Reachability, code-to-cloud context, and remediation quality now matter more than raw alert volume because the real question is whether a finding can actually be exploited.

At a glance

What this is: This analysis explains why AI code security now has to cover both insecure AI-generated code and AI tools that secure code, with reachability emerging as the deciding filter.

Why it matters: It matters because IAM, NHI, and AppSec teams now need controls that govern generated code, dependency trust, and exposed secrets across the full software path to production.

By the numbers:

A USENIX Security 2025 study of 576,000 AI-generated code samples found that hallucinated packages appeared in at least 5.2% of commercial-model outputs and 21.7% of open-source-model outputs.
17 minutes and as quickly as 9 minutes

👉 Read Orca Security's analysis of AI code security solutions in 2026

Context

AI code security is the discipline of securing source code, dependencies, and secrets across the development lifecycle. In practice, the problem has expanded because AI assistants now generate large volumes of code that looks valid but may embed unsafe logic, unverified packages, or exposed credentials.

The governance gap is not just code quality. It is identity and supply-chain trust: generated code can introduce new secrets, new dependencies, and new exposure paths faster than review processes can validate them. For IAM and NHI teams, that means code security, secrets management, and workload exposure are now coupled decisions, not separate controls.

This matters most in cloud-first environments where code, runtime, and data are tightly linked. A finding that looks severe on paper may be irrelevant if it never executes, while a smaller issue on a live request path can become the true blast-radius driver.

Key questions

Q: How should security teams govern AI-generated code in production environments?

A: Treat AI-generated code as untrusted until it passes the same controls you apply to external contributions. That means mandatory review for high-risk logic, automated SAST and SCA in CI, dependency provenance checks, and secret scanning in commits and history. If a control cannot prove the code is safe to deploy, it should block release.

Q: Why do AI-generated dependencies create more risk than normal dependency churn?

A: Because the model can invent a package name that looks legitimate but has no trusted history. Attackers can register that name and deliver malware through a slopsquatting attack. The risk is not just outdated software, but false trust in a dependency that was never intentionally chosen or reviewed.

Q: What do security teams get wrong about vulnerability severity in AI-assisted code?

A: They often assume the highest-severity finding should always be fixed first. In reality, a lower-severity issue on a live, internet-facing request path can be more urgent than a critical issue in unused code. Reachability and runtime exposure should decide priority, not the badge on the alert.

Q: How do organisations know if their AI code security controls are actually working?

A: Look for fewer reachable findings reaching production, faster revocation of committed secrets, and lower rates of unverified dependencies in pull requests. If scans are producing volume but not changing which issues are blocked before deployment, the programme is generating noise rather than risk reduction.

Technical breakdown

Insecure-by-default AI-generated code patterns

AI assistants optimise for plausible, runnable code, not for secure-by-construction logic. That creates recurring patterns such as missing input validation, weak authorization checks, and injection-prone string building. These flaws often evade simple linting because the syntax is correct and the defect sits in application logic. Static analysis still matters here, but only if it is tuned to recognise security-relevant data flows rather than drowning teams in generic findings. The core issue is that generated code inherits the statistical habits of training data, including insecure examples that humans would normally reject during design review.

Practical implication: require security review and SAST on AI-generated changes before merge, especially for authentication, authorization, and data-handling paths.

Hallucinated dependencies and slopsquatting

One of the most distinctive AI code risks is the invented package name. A model can confidently suggest a plausible library that does not exist, and that opening can be exploited when an attacker registers the fake package on a public registry. This is slopsquatting, a supply-chain attack that turns model error into malware delivery. Dependency scanners therefore need to do more than match versions against advisories. They must verify that the package exists, is the intended artifact, and is actually invoked by the application, otherwise the signal remains too noisy to act on.

Practical implication: pin dependencies, verify package provenance, and reject unfamiliar imports suggested by AI until they are independently confirmed.

Reachability and code-to-cloud context

Traditional scanners rank findings by severity, but severity alone does not tell you whether a flaw can be reached in production. Reachability analysis asks whether the vulnerable code path is actually invoked, and code-to-cloud context extends that question to the running workload and its exposed data. That changes prioritisation from theoretical risk to operational blast radius. In cloud estates, this is the difference between a dormant library issue and an exploitable path on an internet-facing service. AI-assisted remediation is only useful when it can reduce that path from code to runtime rather than just generate a patch.

Practical implication: prioritise tools that connect source findings to runtime exposure so teams fix exploitable issues first.

Threat narrative

Attacker objective: The attacker wants to turn AI-generated code mistakes into executable supply-chain compromise or direct credential abuse against production systems.

Entry begins when an assistant introduces an invented or unverified dependency into source code, or when a developer copies a secret into generated code and commits it to a repository.
Escalation occurs when that dependency is published, installed, or the secret remains live long enough for an attacker to use it on a reachable build, API, or cloud path.
Impact follows when the vulnerable code reaches production or the exposed secret grants access to data, compute, or downstream services.

Emerald Whale breach — exposed Git config files led to 15K secrets stolen and 10K repo compromises.
New York Times breach — New York Times source code and credentials exposed via GitHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI code security is now an identity governance problem, not just an AppSec problem. AI-generated code does not merely introduce flaws. It creates new trust decisions about who or what authored a dependency, whether a secret is still live, and whether a workload should ever execute the code path that was generated. That pushes the topic directly into NHI governance because secrets, tokens, and workload identities are now part of the same control plane as source review. Practitioners should treat generated code as an identity-bearing artifact, not just a development convenience.

Reachability is the right prioritisation model because severity alone overstates the real attack surface. A high-severity flaw in unused code is an audit problem, but a medium-severity flaw on a live request path is a production risk. That distinction matters for cloud-native estates where code, identity, and runtime are coupled. Organisations that keep ranking findings by static severity will continue fixing the wrong things first. Practitioners should re-centre triage on whether the code is actually executed and exposed.

Slopsquatting is the clearest named concept in this category because it shows how model error becomes supply-chain abuse. The hallucinated dependency was designed for a world where humans vet package names before adoption. That assumption fails when an assistant inserts the name at machine speed and an attacker can weaponise it before review catches up. The implication is not just better scanning. It is a rethinking of how dependency trust is established when code authorship is partially synthetic.

Secrets management is still the weak link because generated code accelerates secret sprawl faster than revocation processes move. Teams often assume a committed key will be found and rotated before harm occurs, but public exposure windows can be minutes while remediation often stretches into weeks. That gap is especially dangerous when AI tooling itself needs credentials to function. Practitioners should assume the secret lifecycle is now part of the software delivery lifecycle, not a separate hygiene task.

Code-to-cloud context is becoming the differentiator for cloud security programmes because it maps code defects to actual blast radius. The market is moving away from scanners that only describe defects toward platforms that can tell you which finding reaches a workload, which workload reaches data, and which issue therefore deserves first priority. That validates a more operational form of governance across AppSec, cloud security, and identity teams. Practitioners should evaluate tools on exposure linkage, not scan volume.

From our research:
A USENIX Security 2025 study of 576,000 AI-generated code samples found that hallucinated packages appeared in at least 5.2% of commercial-model outputs and 21.7% of open-source-model outputs, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.
Another finding from the same research shows that more than 205,000 unique fake package names were observed, which explains why dependency trust now needs machine verification as well as human review.
For a broader view of how these patterns connect to runtime identity and secret abuse, see AI LLM hijack breach for an adjacent attack path built around compromised credentials.

What this signals

Slopsquatting pressure will keep rising: development teams that allow AI assistants to suggest dependencies without provenance checks are effectively outsourcing supply-chain trust to a model. That is a poor fit for secure delivery pipelines, especially where unverified packages can reach production through automated builds. Teams should pair dependency verification with policy gates in CI and source control.

The governance shift is toward exposure-linked triage, not broader scanning. Programmes that connect code findings to runtime workloads can reduce alert fatigue and focus remediation on the issue that can actually be exploited. That is where code security, cloud security, and identity governance start to converge in practice.

Security leaders should watch the boundary between code and identity more carefully. As assistants generate more code and secrets continue to move through repositories, the operational question becomes which controls can stop untrusted artefacts before they become trusted infrastructure.

For practitioners

Treat AI-generated code as untrusted input Require review and security scanning for AI-generated changes before merge, especially in authentication, authorization, secrets, and data-handling code paths.
Verify dependency provenance before installation Pin versions, confirm the package exists, and reject unfamiliar imports until they are independently validated against the source registry.
Prioritise reachable findings over raw severity Use reachability analysis and runtime context to sort fixes by whether the vulnerable code actually executes in production.
Scan repositories and histories for live secrets Inspect commits, branches, and history for API keys, tokens, and provider credentials, then revoke anything that may already be live.
Map security findings to exposed workloads Choose tools that connect source code issues to the cloud workloads they can reach, so triage reflects operational blast radius rather than theoretical impact.

Key takeaways

AI code security is no longer just about finding bugs in source code, but about governing the trust chain that AI-assisted development creates.
Hallucinated dependencies, secrets in repositories, and reachability blind spots show that alert volume is a poor proxy for real risk.
The most defensible programmes prioritise provenance checks, secret revocation, and code-to-cloud context before release.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Generated code and secret exposure create direct NHI trust and inventory risk.
NIST CSF 2.0	PR.DS-6	Secrets in repos and histories map to data protection and leakage controls.
NIST Zero Trust (SP 800-207)	PR.AC	Reachability and runtime context align with verifying access only when needed.

Inventory generated-code dependencies and secrets, then verify provenance before release.

Key terms

Slopsquatting: Slopsquatting is a supply-chain attack that exploits hallucinated package names suggested by AI systems. An attacker registers the invented name in a public registry and waits for a developer or build pipeline to install it. The risk sits at the intersection of model error, dependency trust, and software delivery speed.
Reachability Analysis: Reachability analysis determines whether a vulnerable code path is actually executed in a running application. It reduces false urgency by separating theoretical defects from exploitable ones. In AI code security, it is the difference between a noisy alert list and a prioritized fix queue tied to production exposure.
Code-to-Cloud Context: Code-to-cloud context links a source code finding to the workload, service, and data it can reach in production. It lets security teams prioritise by real blast radius instead of abstract severity. For cloud estates, it is one of the clearest ways to turn scan results into operational decisions.
AI-Generated Code: AI-generated code is source code produced wholly or partly by an assistant rather than written line by line by a human developer. It can be useful and fast, but it should be treated as untrusted until reviewed because it may contain insecure patterns, invented dependencies, or embedded secrets.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Orca Security: AI Code Security Solutions in 2026. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-24.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org