LLM token theft exposes the hidden cost of free AI inference

By NHI Mgmt Group Editorial TeamPublished 2026-06-12Domain: Governance & RiskSource: WorkOS

TL;DR: LLM token theft turns free-tier AI access into a cost-draining abuse channel, with attackers mass-creating accounts, cycling trials, and reusing stolen payment methods to extract inference at scale, according to WorkOS. The real issue is not just fraud detection but whether identity controls can make abuse economically irrational before compute spend and reliability damage compound.

At a glance

What this is: This is a practitioner analysis of LLM token theft, showing how attackers abuse free AI credits and inference capacity through mass sign-ups, trial cycling, and account reuse.

Why it matters: It matters because IAM, NHI, and fraud teams need controls that distinguish abusive account creation from legitimate usage before free-tier abuse becomes a billing, reliability, and trust problem.

By the numbers:

Roughly 1 in 6 new account attempts on AI platforms is fraudulent.

👉 Read WorkOS's analysis of LLM token theft and free-tier AI abuse

Context

LLM token theft is the abuse of AI products' free or trial access to extract inference without paying for it. The primary governance issue is not model quality, but identity abuse at the account boundary, where attackers can cheaply create, cycle, and discard access faster than product teams can review it.

For AI startups, this is a direct IAM and billing control problem. The same account lifecycle patterns that matter in NHI governance, such as sign-up velocity, disposable identities, and repeated session abandonment, now sit inside the revenue path of the product itself.

Key questions

Q: How should security teams reduce free-tier abuse in AI products?

A: Start by controlling account issuance, not just request traffic. Score sign-ups using device identity, email reputation, and velocity, then add step-up verification where patterns indicate disposable or bulk-created accounts. The aim is to make repeated abuse uneconomic before inference credits are consumed, while preserving low friction for legitimate users.

Q: Why do traditional fraud and network tools miss LLM token theft?

A: Because the abuse often happens before a payment method exists and below the network layer's view of application intent. Network tools cannot see account velocity or email quality, and payment-fraud tools do not help when attackers spend free credits only. Effective controls must operate at identity issuance and application telemetry.

Q: What signals indicate an AI account is being used for token theft?

A: Look for one-and-done accounts, short sign-up-to-depletion cycles, abnormal output-to-input token ratios, disposable email domains, and repeated sign-ups from the same device or proxy pattern. Those signals show that the account is being treated as a consumable resource rather than a real customer identity.

Q: How do teams balance friction and abuse prevention on AI free tiers?

A: Use layered controls instead of universal friction. Reserve stronger checks for high-risk sign-ups, rate-limit repeated resets, and monitor whether abuse is causing routing instability or paging. That approach preserves conversion for legitimate users while forcing attackers to pay more for each account they cycle.

Technical breakdown

Mass account cycling as a free-tier abuse pattern

The core abuse pattern is simple: create account, consume free inference, and repeat. Attackers either delete and recreate accounts themselves or use compromised or cleared identities to keep resetting trial limits. When payment is not required, the attacker’s cost stays near zero while compute consumption shifts to the provider. This is why free-tier abuse behaves more like infrastructure theft than ordinary fraud. It exploits identity issuance, not only payment flows, so the attack surface begins at registration and continues through session reuse, device switching, and limit resets.

Practical implication: treat registration, trial issuance, and account reset as one control surface, not separate product flows.

Why network-layer and payment-fraud tools miss token theft

Network tools see traffic patterns, TLS handshakes, and source IPs, but token theft is decided by application-level signals like account velocity, email reputation, and device identity. Payment-fraud tools only become useful once a card exists, which leaves free-tier abuse invisible until the attacker is already consuming resources. CAPTCHAs help only at the margin because they can be solved cheaply by human farms. The important distinction is that this is identity abuse with monetisation impact, not a payment dispute problem.

Practical implication: instrument application-layer abuse signals before relying on network controls or card checks.

Why output-to-input ratios and one-and-done accounts matter

Abusive usage often shows up as unusual output-to-input token ratios, short-lived accounts, and sign-ups that never return after initial depletion. Those signals matter because they reveal intent to extract value quickly rather than build a lasting relationship with the product. In practice, these patterns also correlate with operational blast radius: routing instability, on-call paging, and distorted usage forecasts. For identity teams, this turns behavioural telemetry into a governance signal, not just a security one.

Practical implication: use behavioural telemetry to spot disposable identities before they turn into recurring cost and reliability incidents.

Threat narrative

Attacker objective: The attacker’s objective is to extract as much paid inference as possible while keeping their own acquisition cost close to zero.

Entry begins with mass sign-up using randomized usernames, disposable email domains, proxies, or stolen payment methods to obtain free or trial access.
Escalation occurs when the attacker cycles accounts, bypasses SMS checks, or sells cleared accounts so one verified identity serves many users.
Impact is paid for by the provider in the form of free inference consumption, distorted routing, and reliability incidents that page engineering teams.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Salesloft OAuth token breach — hackers stole OAuth tokens to access Salesforce data via Salesloft.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

LLM token theft is an identity abuse problem before it is a billing problem. The article shows that the attacker’s real advantage comes from cheap account issuance, trial cycling, and disposable identities, not from model exploitation. That means the control boundary starts at registration and session governance, where identity signals determine whether inference can be monetised by the legitimate operator. Practitioners should treat free-tier access as governed identity, not as a marketing funnel.

Account lifecycle controls are being asked to do work that payment fraud tools were never designed to do. Payment controls only see abuse after a funding instrument appears, while token theft often happens before any card is attached. That creates a governance gap between identity issuance and revenue protection, which is exactly where abuse scales. The implication is that the operating model must align identity, fraud, and product security instead of handing the problem to a single team.

Disposable access has become the new abuse primitive for AI products. The familiar NHI pattern is not credential theft alone, but identity churn: create, consume, discard, repeat. Ephemeral access economics: this is the central concept the article surfaces, because attackers are optimising around short-lived identity value and then exiting before review or revocation can matter. Security teams should recognise that the harm is caused by high-velocity identity turnover, not just by stolen secrets.

Traditional abuse controls fail when the attack is measured in inference minutes rather than account lifetime. The article makes clear that network signals, CAPTCHAs, and generic authentication add-ons are all too coarse when abuse starts and ends inside the application layer. This is a classic NHI governance problem with a product-security edge: the environment needs controls that can score intent at issuance time. Practitioners should re-evaluate where trust is granted, not only where traffic is blocked.

AI inference abuse is a governance signal for the wider NHI programme. Once a product can be mined for free compute, the same actor will often adapt from one disposable identity to another. That means identity teams cannot separate AI product abuse from broader lifecycle hygiene, because the same controls on velocity, reuse, and trust will determine whether the attack stays contained. The practitioner takeaway is to use this class of abuse as a forcing function for cross-team identity governance.

From our research:
24,008 unique secrets were exposed in MCP configuration files in 2025 alone, the protocol's first year of widespread adoption, according to The State of Secrets Sprawl 2026.
28.65 million new hardcoded secrets were detected in public GitHub commits in 2025 alone, a 34% year-over-year increase and the largest single-year jump ever recorded.
For teams building guardrails around AI access and machine identity, Top 10 NHI Issues helps connect secret exposure to lifecycle and governance failures.

What this signals

Ephemeral access economics: free-tier abuse is becoming a reusable pattern across AI products, which means identity teams need controls that evaluate intent at issuance time rather than after usage has already generated cost. The more product telemetry is tied to identity signals, the sooner the organisation can distinguish a customer from a consumptive attacker.

With 64% of valid secrets leaked in 2022 still valid and exploitable today, detection without automated revocation is a weak defence in adjacent identity problems, according to The State of Secrets Sprawl 2026. That same lesson applies here: if abuse prevention does not change the economics of account creation, attackers simply keep cycling identities.

AI products with free tiers should now be designed as governed identity surfaces, not just product surfaces. Teams that already map workload identity, secrets exposure, and access lifecycle can extend those controls into sign-up trust, trial enforcement, and abuse telemetry without rebuilding the programme from scratch.

For practitioners

Instrument sign-up velocity and trial-reset patterns Track repeated account creation, delete-and-recreate behaviour, and abrupt trial depletion as first-class abuse signals. Feed those signals into rate limiting and step-up checks before the account can reach meaningful inference volume.
Score device and email trust at issuance time Use device fingerprinting, disposable-domain intelligence, and email reputation to assess the likelihood that a new account is disposable before credits are issued. The goal is to raise attacker cost before the first meaningful request.
Apply SMS challenge selectively to high-risk sign-ups Reserve stronger verification for patterns that match free-tier cycling or bulk registration. This makes repeated abuse more expensive without forcing the same friction on every legitimate user.
Unify identity, fraud, and product telemetry Correlate application events, billing signals, and security telemetry so that abuse is visible before payment data exists. Free-tier abuse often looks benign in isolation and only becomes obvious when the lifecycle is analysed end to end.
Tie abuse controls to compute-cost thresholds Set operational triggers for when abuse volume starts to distort routing, error rates, or inference spend. That lets teams respond before the issue becomes both a security and reliability incident.

Key takeaways

LLM token theft exploits identity issuance and trial cycling, which makes it a governance and cost-control problem as much as a fraud issue.
The article’s evidence shows that abuse can be high-volume, fast-moving, and invisible to tools that only inspect network traffic or payment instruments.
Teams need controls that raise attacker cost at sign-up, correlate identity with product telemetry, and stop disposable access from becoming a repeatable revenue drain.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Free-tier abuse begins with identity issuance and account lifecycle weakness.
NIST CSF 2.0	PR.AC-4	Access control must account for abusive identities before they consume resources.
NIST Zero Trust (SP 800-207)	AC-4	Zero trust requires continuous assessment of identity trust, not one-time sign-up approval.

Map trial access to least-privilege access rules and tighten entitlement checks for high-risk sign-ups.

Key terms

LLM Token Theft: LLM token theft is the abuse of AI product access to consume inference, credits, or trial capacity without paying. It usually depends on cheap account creation, repeated resets, or stolen payment methods, and it turns identity controls into a direct cost-control surface.
Free-Tier Abuse: Free-tier abuse is the repeated exploitation of trial or no-cost access intended for legitimate evaluation. In AI products, it often appears as rapid sign-up cycling, disposable identities, and one-and-done usage patterns that consume compute before the provider can intervene.
Device Fingerprinting: Device fingerprinting is the practice of using browser and environment signals to recognise the same device across sessions. For AI abuse prevention, it helps connect multiple accounts to one operator and identify account cycling even when email addresses and IPs change.
Trial Cycling: Trial cycling is the repeated creation, depletion, and disposal of accounts to reset usage limits. It is a behaviour pattern rather than a credential issue, so it must be detected through lifecycle, velocity, and device signals instead of relying only on payment or network controls.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by WorkOS: LLM token theft: how attackers drain your AI startup's bottom line. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org