The State of Secrets Sprawl 2024

GitGuardian

GitGuardian - The State of Secrets Sprawl 2024 Report

It is not a secret. Hard-coded credentials have long been a primary cause of security incidents in the software world. Yet, with the growing complexity of digital supply chains, secrets sprawl is the Achilles’ heel for organizations of all sizes and security postures.

The proliferation of 50 million new code repositories on GitHub, a 22% increase from last year, amplifies the risk of both accidental exposures and deliberate malicious acts. This reality underscores the vital need for companies to track and manage the exposure of their sensitive information. Too many remain vulnerable to breaches without awareness or means to mitigate them.

Overview

As we know, the digital world is now expanding and interconnected at a rate never seen before, secrets sprawl has become a major security risk and threat for applications and infrastructure. API Keys, access tokens, and encryption keys are essential to ensure secure access to software, cloud services, and DevOps process. However, the excessive use of these secrets has ended up with massive exposures, commonly in unsecured or public repositories. This report digs deeply into these vulnerabilities, and the techniques used for detection, prevention, and remediation.

Key Findings on Secrets Exposure in 2023

1 - Secrets Exposure in GitHub: GitGuardian has been at the forefront of identifying and reporting hard-coded secrets for the past four years. Remarkably, the incidence of publicly exposed secrets has quadrupled in this time, with a staggering 12.8 million occurrences detected on GitHub.com in the last year alone—a 28% increase from 2022. The increasing number of secrets and credentials on GitHub is worrying, As GitHub is the most popular platform used by individuals and organizations.

2. Using Honeytokens as Decoy Traps: Honeytokens are decoy credentials which act as tripwires, used to alert security teams if it was accessed by any unauthorized parties. It can be also used as a proactive measure to detect and identify any potential breaches and improve the incident response process.

3. Adopting a Shift-Left Security Approach: Shift-Left means to move the security testing process to early stages of the Software Development Life Cycle (SDLC), to help developers in detecting any issues in the code before it is committed to a repository. This methodology does not improve security posture only but also reduces the cost and complexity of remediation.

Conclusion

While detecting vulnerabilities is critical, the real challenge lies in remediation. Security, we believe, must be a shared responsibility across all stages of the Software Development Life Cycle (SDLC), not just the domain of specialized teams. Raising awareness about these seemingly minor lapses is essential for mitigating supply chain risks.

2 - Industry Leaks and Breaches:

• IT Sector: The IT sector represents a 65.9% of all discovered breaches, which indicate a strong reliability on digital infrastructure and the increased use of cloud services.

• Education Sector: The education sector represents 20.1% of leaks, due to the frequent use of open-source platforms.

• Other: From the remaining 14% you can see prevalence of the secrets sprawl problem across many industries.

3- Zombie Leaks: A huge 91.6% of secrets remain valid five days after the targeted organization being notified, showing a significant gap in remediation procedures. These ‘Zombie Leaks’ considered as a threat as even inactive repositories or deleted fills might allow attackers to access sensitive data.

4. AI Integration: The spread of Generative AI and Large Language Models (LLMS) has its pros and cons in secrets detection. While AI has the potential to improve detection accuracy, it also creates false positives and limitations in detecting complex and specific secrets.

Secrets Sprawl Risk Factors

The report illustrates several key risk factors which allow secrets sprawl, making it difficult for organizations to secure and manage sensitive data.

1. Hard-Coded Secrets and Public Exposures - Hard-Coded credentials within code or configuration files remains a major risk factor, once committed to a version control system, these credentials become visible to anyone who has access to the code repository, especially potential attackers.

2. File Extensions and Leaks Probability - High Risk File Extensions: files with .env extension has a high probability of exposing secrets, with a 54% chance of exposing at least one secret.

3. Emerging Threats with AI - AI-Related Secret Leaks: with the increasing use of AI tools, the probability of exposed API keys for these tools has increased by 1212 times in the last year. This increase with the high reliability on AI and the rapid spread of generative AI technologies across industries.

Remediation and Prevention

The report shows that it is necessary not only to detect leaks, but also to implement immediate and efficient remediation to mitigate the risks associated with exposed secrets.

1. Remediation Challenges - Limited Revocation Rate: Only 2.6% of secrets are revoked within an hour from being notified. In most cases, remediation delay can allow attackers to have much more time to access and steal these sensitive data.

2. AI and LLMs in Secrets Detections - Performance and Limitations: ChatGPT’s efficiency in detecting secrets was tested, and the results showed that it failed to identify 15.2% of secrets, in addition to generating false positives, especially when generic patterns were applied.

The importance of Awareness and Training

Awareness and Training has a crucial role in eliminating Secrets Sprawl. The report emphasizes the importance of ongoing training to reduce the gap between developers who create secrets and security teams who protect these secrets.

1. Practical Security Training: Interactive training methods, such as Capture the Flag (CTF) challenges and hands-on workshops, provide developers with practical experience in secrets management. By paying attention to real-world cases and scenarios, organizations can build a stronger security posture and reduce clashes among teams.

2. Blameless Culture: It’s important to adopt the idea of blameless culture, in which mistakes are considered as opportunities for improvement rather than sources of blame. This strategy encourages individuals to take part in remediation processes and promotes continuous improvement.

Secrets Sprawl Case Studies

To demonstrate the severity of secrets sprawl, the report presents several critical incidents:

Toyota’s Data Breach: An accidental exposure of an API key in a public GitHub repository brought about a five-year data breach for Toyota, compromising the personal information of about 296,019 users. This incident shows the severe impact of exposing secrets and remaining undetected for a long amount of time.
GitHub Supply Chain Attack: GitHub has always been a popular target for attackers. Recently, GitHub has witnessed an increase in supply chain attacks, with threat actors exploiting GitHub’s infrastructure to spread malwares and collect credentials. This refers to the necessity for tighter controls and robust detection mechanisms.

Secrets Management Strategies

1. Implementing Automated Detection and Prevention: Real-Time monitoring of repositories for exposed secrets makes it easier to detect any incident, resulting in faster remediation. Integrating tools like GitGuardian’s GGShield CLI into CI/CD pipelines can help in preventing leaks before they reach production stage.

Download The full report

About Glossary

Terms & Conditions Privacy policy

Non-Human Identity Mgmt Group

The Non-Human Identity Management Group is the #1 Authority for research and education on Non-Human Identity risks. We provide Independent guidance and expert advice to organizations, helping them understand, manage and secure these huge risks effectively.

Subscribe to our Newsletter

The State of Secrets Sprawl 2024

GitGuardian - The State of Secrets Sprawl 2024 Report

Contact us if you would like to get some independent guidance and advice on how to start tackling Non-Human Identity Risks.