The State of Secrets Sprawl 2026 | GitGuardian Annual Report

The State of Secrets Sprawl 2026 – GitGuardian

Introduction: The Year “Vibe Coding” Broke the Dam

For the software industry, 2025 was the year the floodgates gave way. The meteoric rise of generative AI transformed “vibe coding” from a niche experimental curiosity into a mainstream industrial force, effectively collapsing the traditional barriers to entry for software creation. But this democratization of development came with a staggering, compounding price tag: 28.65 million new hardcoded secrets detected in public GitHub commits in a single year.

This is not a cumulative tally; it represents the sheer volume of new API keys, passwords, and certificates exposed in 2025 alone. It marks a 34% increase from the previous year—the largest single-year jump ever recorded by GitGuardian. While AI has granted us the power to build at a velocity previously unimaginable, it has simultaneously hallucinated a sense of security, creating a massive “identity debt” that most organizations are ill-equipped to pay. As we dive into the State of Secrets Sprawl 2026, it is clear that our creation velocity has officially outpaced our governance maturity.

The AI Infrastructure Leak Trap

The most striking trend in our latest data is the explosion of AI-related credentials, which surged by 81.5% year-over-year. As organizations rush to deploy “agentic AI” and complex RAG (Retrieval-Augmented Generation) architectures, the “AI stack” is becoming a primary source of algorithmic sprawl.

The risk, however, is not evenly distributed. While the core LLM providers (OpenAI, Anthropic) are relatively well-monitored, the surrounding infrastructure is leaking 5x faster. This “Infrastructure Leak Trap” is fueled by the complexity of connecting disparate services. We are seeing unprecedented surges in leaks for retrieval APIs and orchestration tools: Brave Search (+1,255%), Firecrawl (+796%), and Perplexity (+657%) have become high-growth hotspots for credential exposure.

Furthermore, as new providers emerge, there is an inevitable lag before security protections catch up. A prime example is DeepSeek, where we detected 113,000 new API keys in 2025. This arms race between new AI services and security guardrails ensures that 8 of the top 10 fastest-growing types of leaked secrets are now tied directly to the AI ecosystem.

“8 of the top 10 fastest-growing types of leaked secrets YoY are tied to AI services.”

The “Claude Code” Penalty: AI Assistants are 2x More Likely to Leak

In 2025, the “AI penalty” became a measurable reality for engineering teams. Specifically, commits co-authored by Anthropic’s Claude Code leaked secrets at a rate of 3.2%, more than double the human-only baseline of 1.5%.

This elevated risk profile is driven by a distinctive “AI-generated change set” profile. Claude Code-assisted commits are consistently larger, often containing 2x the number of lines as human-only commits. This sheer volume of code provides more surface area for credentials to slip through. At its peak in August 2025, Claude Code-assisted commits reached 31 secrets per 1,000 commits—roughly 2.4x the human baseline.

The turning point only arrived in late September 2025 with the release of Claude Sonnet 4.5. This update triggered a downward trend in leak rates toward the human baseline, yet the underlying risk remains: the “human in the loop” is still the critical failure point. Developers under pressure often override safety suggestions or explicitly prompt AI to include sensitive information in local configurations for “speed.”

The Immortal Secret: Why 64% of Leaks Never Die

A leaked secret is not a short-lived mistake; it is a durable access path. Perhaps the most alarming finding in our four-year lookback is that 64% of valid secrets leaked in 2022 are still valid and exploitable today.

This “lifecycle negligence” proves that detection is only the first step. The real-world stakes were highlighted by the Shai-Hulud 2 supply chain attack, which revealed that secrets aren’t just lingering on developer laptops—they are embedded in the heart of our infrastructure. The attack found that 59% of compromised machines were CI/CD runners rather than personal workstations. When a secret leaks into build infrastructure, it grants attackers the ability to manipulate workflows and move laterally across the entire software supply chain. Until revocation and rotation become automated and routine, we are essentially leaving the keys to the kingdom in plain sight for years at a time.

“Until revocation and rotation become routine, owned, and automated, a leaked secret is not a short-lived mistake. It is a durable access path that can sit in plain sight for years.”

The Private Repo Fallacy: Why Your Internal Perimeter is a Sieve

There is a dangerous, prevailing myth that internal repositories are “safe” because they are hidden from the public. The data reveals this to be a total mirage: internal repositories are 6x more likely to contain secrets than public ones (32.2% vs. 5.6%).

Privacy is not a security control. Because developers feel a false sense of security within private systems, they are significantly more likely to hardcode high-value, “production-ready” credentials. This complacency extends beyond the codebase. 28% of incidents now originate entirely outside of code repositories—in Slack, Jira, and Confluence. Critically, these non-code leaks are 13% more likely to be categorized as critical than those found in code, creating a massive, unmonitored blind spot for executives who believe their security perimeter ends at GitHub.

MCP: The New Frontier of Exposed Credentials

The Model Context Protocol (MCP) emerged in 2025 as the standard for connecting LLMs to external data. However, the rush to adopt it led to a “normalization” of hardcoding; we detected 24,008 unique secrets exposed in MCP configuration files in its first year.

This sprawl is often encouraged by official documentation that suggests passing API keys as CLI arguments or storing them in JSON config files. The danger of this centralized exposure was illustrated by the Smithery.ai vulnerability, where a single path-traversal bug in an MCP registry exposed overprivileged tokens, granting arbitrary code execution across 3,000+ hosted servers.

To mitigate these risks in agentic workflows, organizations must adopt these best practices:

Use Environment Variables: Never store secrets in MCP config files; use a dedicated secrets manager.
Client Ownership: Clients, not servers, should own credentials and provide them at query time.
Human-in-the-Loop: Require manual approval for any MCP action touching production systems or deployment pipelines.
Version Control Exclusion: Strictly exclude MCP configuration directories from version control via .gitignore.

Conclusion: Is Your Creation Velocity Outpacing Your Identity Maturity?

The central challenge of the AI era is the widening gap between software creation and identity governance. AI is exponentially accelerating the creation of software, but it is not accelerating the governance of the Non-Human Identities (NHIs) that power it.

We are living in a world of “Identity Debt,” where service accounts, API keys, and agent tokens are created in seconds but persist for years. Organizations must move from a reactive posture—asking “Where are my leaked secrets?”—to a strategic one of NHI Governance. You must be able to answer:

What non-human identities exist in my environment?
Who owns them?
What data or resources can they actually access?

If your organization cannot answer these questions, your AI adoption is currently outpacing your security posture. The goal for 2026 is not just finding leaks; it is achieving identity maturity in an ecosystem built by machines.

The State of Secrets Sprawl 2026 Download