Web scraping is exposing retailers to bot-driven revenue loss

By NHI Mgmt Group Editorial TeamPublished 2025-09-24Domain: Governance & RiskSource: Arkose Labs

TL;DR: Web scraping is now a major retail abuse pattern, with Arkose Labs citing QVC’s reported $2 million in lost sales, server crashes, and downtime, while its 2025 threat actor analysis ranks retail as the fourth most targeted industry by bad bots. The underlying problem is not just data theft but the way automated abuse distorts operations, analytics, and customer trust.

At a glance

What this is: This is Arkose Labs’ analysis of how website scraping and bad bots are costing retailers traffic, revenue, and operational stability.

Why it matters: It matters because identity and access teams increasingly have to govern machine-driven abuse at the edge of customer experience, not just protect logins and internal credentials.

👉 Read Arkose Labs' analysis of website scraping and retail bot abuse

Context

Website scraping is the automated collection of product, pricing, inventory, or content data from public web properties. In retail, that activity becomes an availability and governance issue when bots consume server resources, distort analytics, and erode customer trust as well as commercial margins.

For IAM and security teams, the harder problem is that scraping sits alongside credential stuffing, account creation abuse, and proxy-based impersonation. Those behaviours do not only target content, they exploit identity-adjacent controls at the edge of the customer journey, where conventional bot checks and rate limits often fail.

The article’s QVC example is a useful warning because the impact was operational, financial, and reputational at once. That is typical of mature scraping campaigns, which rarely stay confined to a single control failure.

Key questions

Q: How should retailers reduce the risk of website scraping without hurting customer experience?

A: Use layered bot controls that combine behavioural analytics, device and session context, and adaptive challenges. The goal is to distinguish legitimate shoppers from automated collectors in real time, then apply friction only when request patterns show automation. That approach reduces false positives and keeps conversion loss lower than broad blocking tactics.

Q: Why do static anti-bot controls fail against modern scraping campaigns?

A: Static controls assume the attacker stays visible in one place long enough to be blocked. Modern scrapers rotate IPs, vary request timing, and mimic human browsing, which lets them slip past simple rate limits and generic CAPTCHAs. Effective defence needs context-aware detection across traffic patterns, sessions, and accounts.

Q: What should security teams do when scraping starts affecting analytics and conversion data?

A: Treat the problem as both a security and business integrity issue. Validate whether bot traffic is inflating page views, session duration, or product interest, then isolate those signals before they reach merchandising and planning teams. Otherwise, organisations may optimise around machine-generated noise rather than real customer demand.

Q: What frameworks are relevant for governing web scraping and bot abuse?

A: Zero Trust and cybersecurity framework principles are relevant because they emphasise continuous verification, telemetry, and response based on observed behaviour. Teams should also align anti-bot controls with identity and access signals where logins or account creation are involved, so automated abuse is handled as a trust problem, not just a traffic problem.

Technical breakdown

How scraping bots evade basic detection

Modern scraping tools rarely rely on a single request pattern. They distribute traffic across proxies, rotate IP addresses, vary user agents, and mimic browsing sequences so that volume looks more like normal retail traffic than a direct attack. Some campaigns also create accounts or attempt logins to pass content gates and appear legitimate. The technical problem is that these signals are individually weak but collectively meaningful. If defenders only inspect one control layer, such as IP reputation or CAPTCHA response, the bot can often shift tactics without losing access to the target data.

Practical implication: correlate session behaviour, device trust, velocity, and account creation signals rather than relying on one blocking control.

Why scraping hurts operations, not just data confidentiality

Scraping is often treated as content theft, but high-volume automation also becomes an infrastructure load problem. Repeated page requests and API calls can saturate web tiers, slow response times, and trigger intermittent outages. That means the business impact includes abandoned carts, wasted marketing spend, and degraded conversion rates, even when no sensitive internal system is breached. In retail environments, analytics contamination adds another layer of harm because bot activity can inflate traffic, session duration, and product interest, pushing commercial decisions off course.

Practical implication: treat bot traffic as an availability and data-quality threat, not only an intellectual property problem.

Why layered anti-bot controls outperform static blocks

Static defences such as IP blocking, simple rate limiting, and generic CAPTCHA challenges are easy for advanced scraping operations to work around. A stronger model uses risk scoring, adaptive friction, and telemetry across web, API, and mobile channels so the response can change as the attacker changes tactics. That does not mean every request should be challenged. It means the control plane has to evaluate request context and escalation risk in real time, then distinguish between legitimate shoppers and automated collection behaviour with enough precision to avoid harming conversion.

Practical implication: build response tiers that escalate only when the request pattern shows automation, not just high traffic.

NHI Mgmt Group analysis

Website scraping is now an identity-adjacent abuse problem, not a web-only nuisance. The article shows that bot campaigns create revenue loss, uptime issues, and distorted analytics in one motion. That matters because the control failure is not limited to content protection, it sits at the boundary where customer identity, session trust, and request legitimacy overlap. Practitioners should treat scraping as part of the broader machine-driven abuse surface, not a standalone web concern.

Static bot controls create a false sense of containment. IP blocking, rate limits, and basic CAPTCHAs all assume the attacker stays in one observable pattern long enough to be stopped. The article’s description of rotating IPs, mimicry, and account abuse shows why that assumption is brittle. The practical conclusion is that defenders need continuous behavioural context, not one-time challenge logic.

Scraping contamination can be as damaging to decision-making as to infrastructure. When bots inflate traffic and session metrics, merchandising, acquisition, and conversion analysis all become less reliable. That is a governance issue, not just a detection issue, because business teams may optimise against machine-generated noise. Security leaders should expect bot activity to be measured in both loss and misdirection.

Retail scraping is a preview of how non-human abuse scales across digital business models. The same mechanics that steal price data can be reused for fraud, credential attacks, or synthetic account generation. As machine-driven activity increases, the line between bot management, fraud defence, and identity governance keeps narrowing. Practitioners should align those functions before attackers force the merge.

From our research:
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap, according to The State of Secrets in AppSec.
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities.
For related identity and machine-abuse context, read Top 10 NHI Issues to see how unmanaged machine behaviours compound governance gaps.

What this signals

Bot management is becoming part of identity governance because abuse now flows through accounts, sessions, and behavioural trust signals. Retail teams that separate fraud, IAM, and web protection will keep missing the same attacker patterns in different forms. A stronger operating model links request telemetry, account creation, and access behaviour so the organisation can distinguish customers from automation before revenue and analytics are distorted.

Website scraping is a useful reminder that machine activity can damage both security and decision quality. When bot traffic rewrites the shape of your analytics, commercial teams are no longer planning against real demand. That makes detection accuracy a business control, not only a security metric.

Retailers should expect bot abuse to keep merging with credential attacks and synthetic identity activity. The control posture that stops scraper infrastructure today will also inform how teams handle account abuse tomorrow. Cross-functional ownership between security, fraud, and digital operations is now the pragmatic baseline.

For practitioners

Instrument behavioural bot detection across channels Correlate request velocity, session consistency, device signals, and navigation patterns across web, API, and mobile surfaces so scraping cannot hide behind one control boundary.
Escalate friction only on high-risk traffic Use adaptive challenges for suspicious sessions while allowing low-risk shoppers through quickly, so anti-bot enforcement protects revenue without suppressing legitimate conversion.
Monitor analytics for bot contamination Review traffic, session duration, and conversion anomalies for signs that automated visits are distorting merchandising and demand planning decisions.
Align legal and technical response paths Pair cease-and-desist and terms-of-service enforcement with operational detection so known scraping activity can be identified, documented, and blocked consistently.

Key takeaways

Website scraping now creates a compound risk of revenue loss, service degradation, and contaminated analytics.
The article’s QVC example shows that bot abuse can translate into immediate financial harm, not just nuisance traffic.
Retailers need adaptive, behaviour-based controls that distinguish legitimate customer activity from automated collection at runtime.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

NIST CSF 2.0, NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	PR.AC-4	Access and session trust matter when bots mimic legitimate users.
NIST Zero Trust (SP 800-207)		Continuous verification fits adaptive bot detection and response.
NIST CSF 2.0	DE.CM-1	Detection of anomalies is central to spotting scraping campaigns.

Build telemetry that surfaces abnormal request patterns before business impact grows.

Key terms

Website Scraping: Automated collection of public website content at scale, usually to harvest pricing, product, inventory, or other commercially valuable data. In security terms, scraping becomes a governance issue when it creates service load, distorts analytics, or crosses into account abuse and identity-adjacent controls.
Bad Bot Traffic: Non-human request traffic that is designed to imitate legitimate users while pursuing automation-driven goals such as scraping, credential attacks, or account creation abuse. The key risk is not only volume, but the way the traffic blends into normal customer behaviour and bypasses simplistic controls.
Adaptive Challenge: A response mechanism that increases friction only when a session or request shows elevated risk. It relies on telemetry, behavioural scoring, and context so that legitimate users are not unnecessarily blocked while automation faces stronger verification or challenge steps.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Arkose Labs: Website Scraping Website Scraping: The Hidden Threat Bleeding Retailers Dry. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-24.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org