AI LLM Hijack Breach

In October 2024, Permiso Security reported a sophisticated cyberattack revealed critical vulnerabilities in the infrastructure of cloud-hosted large language models (LLMs), such as AWS Bedrock, Anthropic, and Google Vertex AI. “LLMjacking,” this emerging threat targets AI applications hosted on major cloud platforms, exploiting their inherent scalability and computational power for malicious purposes.

In this incident the attackers used compromised cloud access keys, often obtained through framework vulnerabilities or through phishing schemes, to gain unauthorized control of these advanced AI systems. Using these credentials, they bypassed traditional security measures, invoking APIs such as InvokeModel and InvokeModelWithResponseStream to discreetly hijack AI resources.

The Rise of Hosted AI Models

Hosted AI models, like OpenAI’s GPT or similar platforms, are designed to respond to user prompts in natural and creative ways. Whether it’s helping someone write a story, debug code, or answer questions, these systems are versatile and accessible. However, their open-ended design is both a strength and a weakness. AI doesn’t understand morality or intent, it responds to inputs based on patterns it’s learned during training. This neutrality makes it vulnerable to exploitation by users seeking to push ethical boundaries.

What Is Dark Roleplaying?

Dark roleplaying occurs when users co-opt AI for scenarios that simulate harmful or unethical behavior. Think of it as a twisted form of storytelling where users prompt the AI to act out scenarios involving violence, abuse, or other morally questionable activities.

Examples of Dark Roleplaying

Simulating criminal activities, such as planning a heist or creating phishing schemes.
Generating graphic or violent narratives for entertainment or experimentation.
Creating fake conversations or scenarios used to harass or blackmail individuals.

What is LLMHijacking?

Imagine renting a car for a road trip, only to find someone else secretly using it for joyrides at your expense. This analogy reflects LLMHijacking, the exploitation of hosted Large Language Models (LLMs) like GPT or Bard by unauthorized users for malicious purposes. Attackers identify vulnerabilities in public or cloud-hosted LLMs like misconfigured APIs or leaked credentials and hijack access. Once inside, they exploit the stolen resources to generate phishing emails, obfuscated malware code, or even monetize it as underground AI services. Beyond financial theft, LLMHijacking empowers cybercriminals to scale attacks like never before.

Why do hackers perform LLM Hijacking?

Attackers hijack Large Language Models (LLMs) for several reasons, primarily centered around exploiting their computational power, bypassing security protocols, and generating malicious content.

Monetizing Hijacked AI Models

Once attackers gain control of a cloud-hosted LLM, they can resell access to the hijacked resources. This practice allows cybercriminals to monetize their stolen access. They can offer malicious services to other bad actors, such as automated content generation for phishing campaigns, crafting fake news or deepfakes, or even providing illegal material through the AI models, all without bearing the cost of operating the infrastructure themselves.

What Happened?

Initial Compromise: Credential Theft

Access Key Exploitation: Attackers acquired AWS access keys through techniques like phishing, credential stuffing, or exploiting publicly exposed keys in GitHub repositories.
Cloud Enumeration: Using stolen credentials, they identified accessible services, including AWS Bedrock. API queries like GetCallerIdentity helped attackers validate account access.

Probing Cloud Services

Validation of Permissions: Attackers employed unusual parameters, in the InvokeModel API, to probe AI model access without triggering outright denial errors. This method confirmed the existence of active AI models without raising alarms in conventional logging mechanisms.
Region-Specific Targeting: Knowing Bedrock was only available in specific regions, attackers tested operations selectively in us-east-1, us-west-2, and other supported regions, ensuring efficient use of their resources.

API Invocation and Reverse Proxy Setup

InvokeModel Usage: Attackers invoked the InvokeModel and InvokeModelWithResponseStream APIs to interact with and control AI models. Streaming responses enabled token-by-token real-time data extraction, optimizing their malicious workflows.
OAI Reverse Proxy Deployment: A reverse proxy server was established to route traffic through compromised credentials. This layer obfuscated attacker origins while efficiently managing multi-model deployments.

Model Jailbreaking

Jailbreaking Techniques: Exploiting vulnerabilities in model prompt design, attackers bypassed AI safeguards. They used prompt injection techniques sourced from underground forums, enabling LLMs to generate explicit or harmful content despite content filters.
Concurrent Requests: Custom scripts, often generated by the hijacked LLMs themselves, allowed attackers to automate interactions with multiple AI models, ensuring continuous content generation and extraction.

Monetization and Payload Delivery

Service Resale: Attackers monetized their access by selling unauthorized LLM usage to other cybercriminals. For example, generating synthetic scripts or creative outputs tailored for malicious intent.
Mass Invocations: Over 75,000 invocations were logged in a short period, almost all of a sexual nature, demonstrating industrial-scale misuse. Data was likely saved and organized for resale or direct criminal application.

Can This Be Prevented?

Enhanced Credential Management

API Key Protection: Secure storage of access keys is paramount. Environment variables should be used to store API keys instead of hardcoding them into the codebase. Furthermore, access keys should be rotated regularly, and the least privilege access policies should be enforced to limit the potential damage if keys are compromised.
Multi-Factor Authentication (MFA): MFA should be enabled for all cloud services to add an extra layer of security, ensuring that even if an access key is compromised, unauthorized users cannot gain full access.

Advanced Access Controls and Monitoring

Role-Based Access Control (RBAC): Organizations should implement granular access control based on roles. Only authorized users should be granted permission to invoke models or access sensitive AI resources.
Real-Time Monitoring and Anomaly Detection: Tools that track and analyze the usage of cloud services can alert administrators to suspicious activities, such as unusual API requests or excessive invocation patterns. Advanced anomaly detection mechanisms should be deployed to flag potential hijacking attempts.
Logging and Auditing: Ensure that all actions performed by AI models and their interactions with cloud services are logged. Use services like AWS CloudTrail or Google Cloud Logging to monitor API calls, track access, and investigate suspicious events.

Secure API Design

Rate Limiting: Cloud providers should impose rate limiting on API calls to prevent abuse through rapid, large-scale requests that might strain resources or go undetected. If a service is invoked excessively in a short period, automated defenses should kick in to prevent further access.
Endpoint Security: Ensure all API endpoints are secured and properly authenticated, using protocols like OAuth2 or JWT tokens. This ensures that only authorized users or services can invoke models, mitigating the risk of external attackers using stolen keys.

Education and Awareness

Employee Training: Training teams to recognize phishing and credential theft techniques will reduce the risk of initial compromises. By building awareness of social engineering tactics and common attack methods, the organization can prevent attackers from gaining access in the first place.
Threat Intelligence Sharing: Organizations can collaborate with industry peers and share insights into new attack vectors, ensuring that they stay ahead of evolving LLM hijacking techniques.

Stronger AI Safeguards

Prompt Engineering: AI providers should enhance prompt engineering to detect and block attempts to bypass content moderation rules. By analyzing common bypass techniques and training models to recognize them, AI systems can be made more resistant to exploitation.

Conclusion

The LLM hijacking incident highlights a critical vulnerability in cloud-hosted AI systems. Cybercriminals exploiting stolen access keys and bypassing security safeguards, like jailbreaking AI models, have shown how easily these powerful tools can be weaponized for malicious purposes. This breach underscores the urgent need for enhanced security measures in AI and cloud environments. As AI continues to play a pivotal role in various sectors, securing these technologies is crucial to prevent further exploitation and maintain trust in their use.