Web Scraping News: Comprehensive Review for February 2023



Legal updates and new tools available in February for the web scraping industry

Hi everyone and welcome back to The Web Scraping Club, this post is our monthly review of what happened in the web scraping industry in February.

Legal updates

There’s no monthly recap without news about legal actions involving web scraping and privacy violated.

brown wooden smoking pipe on white surface
Web Scraping News: Comprehensive Review for February 2023 5

Meta vs Voyager Labs

The first news of the month is Meta suing the Israeli firm Voyager Labs for creating fake accounts on Facebook used to collect a large amount of personal data. This was then aggregated with other data points and sold to law “agencies tasked with public safety.” Meta states that this behavior breaks the Terms of Services of Facebook and, in my opinion, there’s almost no doubt about it.

The full article can be found here on Arstechnica.

Meta vs Bright Data

In the case involving always Meta and Bright Data, which started several months ago, there are some updates that arrived in the latest weeks.

Meta is suing Bright Data for its scraping activity on Facebook, while Bright Data insists they were scraping public data from Facebook, so they were not breaking any term of service.

During the discussion of the case in the past weeks, a curiosity came out from the papers: Meta has been a customer of Bright Data for 6 years, requesting data to improve its ad system.

Full article here.

This post is sponsored by Smartproxy, the premium proxy and web scraping infrastructure focused on the best price, ease of use, and performance.

Smartproxy
Smartproxy

In this case, for all The Web Scraping Club Readers, using the discount code WEBSCRAPINGCLUB10 you can save 10% OFF for every purchase.

Video of the month

Always about the legal aspect of web scraping, one of the most interesting videos I’ve found this month was made by William Whitman, an attorney who just opened his Youtube Channel and talks also about web scraping.

Tech updates

Oxylabs released its web unblocker

Oxylabs revealed its new web unblocker, an AI-powered tool to bypass anti-bot solutions. It selects, rotates, and evaluates the most suitable proxies for a specific site to provide the highest possible success rate along with the lowest response time. The system also selects the right combination of headers, cookies, browser attributes, JavaScript fingerprints, and proxies to appear as a real user, not triggering CAPTCHAs and bypassing target website blocks.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


Here’s an image that recaps the functioning.

Oxylabs unblocker functioning
Web Scraping News: Comprehensive Review for February 2023 6

New Chrome headless released

Some days ago, Antoin Vastel of Datadome analyzed what will happen when the versions of Chrome will be rolled out and used more broadly.
Basically, Chrome used in Headless mode will have a fingerprint much more similar to the headful one and this will have impacts on anti-bot techniques.

As Vastel says: “The new headless Chrome browser fingerprint is way more realistic than the first/old version of headless Chrome. Depending on the sophistication of your detection engine, it’s going to make it easier for bot developers to bypass detection, particularly detection based on browser fingerprinting signals. As written in Chromium’s code, The new headless mode // is Chrome browser running without any visible UI. Thus, a lot of subtle differences that used to exist between the old headless Chrome and a genuine headful Chrome don’t exist anymore.”

This is an interesting topic to follow for anyone involved in web scraping, you can find the full article here.

The most-read article of the month

The most successful article of the month is the Anti-detect Anti-bot matrix, a post where we compared several techniques against different anti-bot solutions to find out which configuration can be the best match for them.

This concluded my selection of posts and articles for February, if I’ve missed something important please let me know in the comments or in our Discord server.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.