New research has revealed the scale of the growing problem of web scraping across some of the world’s biggest websites.
Smartproxy’s report of the most scraped websites of 2024 claims social media pages make up more than one-quarter (27%) of the most scraped sites.
During 2023 and the first three months of 2024, bots were most interested in search engines like Google (42%), however social media accounts and community forums collectively accounted for one-third (34%) of observed scraping instances.
Google is the most scraped website
While alarming, many of the most scraped sites are thankfully not targets for personal data mining, with search engines and ecommerce leading the way.
“This trend showcases the critical need for real-time search data across various sectors, including the ever-growing AI field, where data plays a crucial role in training AI models,” Smartproxy CEO Vytautas Savickas said.
“Additionally, eCommerce platforms contribute to a large portion of most scraped targets, reflecting the industry’s push for competitive intelligence needed for dynamic pricing strategies.”
Ecommerce sites, which make up around one-fifth (18%) of scraping requests, represent a growing segment. Smartproxy noted that shopping trends are emerging, and with consumers seeking more competitive prices, real-time data has become increasingly important.
The report also details ecommerce scraping peaks, with shopping periods like Black Friday (+64%), Christmas (+46%) and Amazon Prime Day (+22%) all seeing considerable spikes.
“Businesses intensify their scraping efforts during these times to capture the value of data generated by the rush of online shoppers seeking discounts and special offers,” Savickas added.