Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis



Is Bright Data Web Unlocker capable to defeat Datadome and other anti-bot solutions?

Hi everyone, this is a new series of posts from The Web Scraping Club, where I will try out products related to web scraping and make a sort of review about it. I hope this helps you to evaluate products before spending some money and time on testing them. Feel free to write me at pier@thewebscraping.club with any feedback and if you want me to test other products or solutions.

These Hands On episodes are not sponsored and the ideas expressed are my own, backed by quantitative tests, which change from the kind of product I’m testing. There might be some affiliate links in the article, which helps The Web Scraping Club be free and able to test even paid solutions.


In this first issue, we’ll test the Bright Data Web Unlocker proxy, an API that promises us we can scrape even the most difficult and protected websites.

What is Bright Data Web Unlocker

Digging a bit more into Bright Data’s website we can understand better how this works. Directly from the product page:

Limits requests per IP

Manage IP usage rates so you don’t ask for a suspicious amount of data from any one IP

Emulates a real user

Automated user emulation including: starting on the target’s homepage, clicking their links, & making human mouse movements

Imitates the right devices

Web Unlocker emulates the right devices that servers expect to see

Calibrates referrer header

Makes sure the target website sees that you are landing on their page from a popular website

Identifies honeypots

Honeypots are links that sites use to expose your crawlers. Automatically detect them and avoid their trap

Sets intervals between requests

Automated delays are randomly set between requests

All these features can be summed up with the following picture.

Bright data Unblocker
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 21

It seems a good solution and easy to integrate into our scrapers since it’s basically like adding a proxy to them.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


Our testing methodology

To test this kind of product I’ve developed a plain Scrapy spider that retrieves 10 pages from 5 different websites, one per each anti-bot solution tested (Datadome, Cloudflare, Kasada, F5, PerimeterX). It returns the HTTP status code, a string from the page (needed to check if the page was loaded correctly), the website, and the anti-bot names.

The base scraper cannot retrieve correctly any of the records and this will be our benchmark result.

As a result of the test, we’ll assign a score from 0 to 100, depending on how many URLs are retrieved correctly on two runs, one in a local environment and the other one from a server. A score of 100 means that the anti-bot was bypassed for every URL given in input in both tests, while our starting scraper has a score of 0 since it could not avoid an anti-bot for any of the records.

You can find the code of the test scraper in our GitHub repository open to all our readers.

Preparing for the test

Create an account on Bright Data

First of all, you need to create an account on Bright Data’s website. After you received the credentials, you need to set up a payment method and top up your account with some credits, in order to create a new proxy zone.

To do so, navigate to My proxies and select Web Unlocker.

Bright Data Proxies
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 22

For our tests, after choosing the proxy name and setting up a password, we whitelisted the domains of the websites we were going to test.

Bright Data our zone
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 23

In particular, we have whitelisted:

  • harrods.com for testing Cloudflare. At the moment, this website has one of the most strict policies I’ve seen between Cloudflare-protected websites.

  • neimanmarcus.com for testing PerimeterX.

  • canadagoose.com for testing Kasada.

  • nordstrom.com for testing F5 (formerly known as Shape)

  • footlocker.it for testing Datadome.

Setting up the Scrapy scraper

As said before, I’ve manually chosen fifty URLs, ten per website, as a benchmark and input for our Scrapy spider.

35f94e18 182a 481e 9973
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 24

The scraper basically returns the Antibot and website names, given in input, the return code of the request, and a field populated with an Xpath selector, to be sure that we entered the product page and were not blocked by some challenge.

I’ve added to the scraper the advanced-scrapy-proxies package, which is my version of the random scrapy proxy package with some more features.

You can find anyway the full code of the scraper on the free GitHub repository of The Web Scraping Club.

First run: no Web Unlocker on local environment

After disabling the proxy management from the setting.py file in the scraper, I started a first run where I expect the failure for all the requests.

No proxy setup
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 25

PROXY_MODE = -1 in the advanced-scrapy-proxies package means that the scraper runs without proxy.

The results after the first run are the following.

First run results
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 26

Basically, every website returned errors except Nordstrom, which returned the code 200 but without showing the product we asked.

Second run: using the Web Unlocker on local environment

Now let’s activate the Web Unlocker and see what happens.

With Bright Data Unblocker
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 27

Here are the results.

Bright Data Unblocker results
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 28

There’s a significant improvement in success rate, with only four records not returned from Nordstrom. But also, in these cases, running again the scraper allows retrieving also these records so we can say that the challenge is passed. Without any headful solution like Playwright or Selenium, we were able to bypass the most common anti-bot softwares.

Third run: using the Web Unlocker on cloud environment

Let’s use the same settings but this time we’re going to run the scraper from a virtual machine in a cloud environment. We’re running this third test to understand if the device fingerprint of a server is going to be leaked to the target websites.

With Bright Data Unblocker
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 29

Here are the results, pretty similar to the ones we got in a local environment.

Web Unblocker results on Cloud
Hands On #1: Bright Data Unblocker vs. Cloudflare – An In-depth Analysis 30

Again we have some 502 codes on F5 and on Kasada, for a total of 8 out of 50 results. Probably they can be fixed with a second try as the Web Unlocker can bypass other records of the same solution.

Final remarks

I need to be honest, the Bright Data’s Web Unlocker surprised me in a positive way. I already tried it some years ago and was unimpressed but now I have to admit it solves many headaches.

Pros

  • Easy implementation in any scraper

  • Solves most of the anti-bot challenges and solutions (at least from what I’ve seen from this small test)

Cons

  • Pricing per GB is quite expensive (but on the same level as comparable solutions from their competitors), and this makes it unusable for low-budget/large-scope projects. This can be mitigated using the CPM (units of 1,000 requests) charging where you get billed only for successful requests and not for failed ones.

Rating

Since the Bright Data Web Unlocker solved 46 out of 50 URLs in the first run, and 42 out of 50 in the second one, its final score is 88/100.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.