Hands On #5: Oxylabs Web Unblocker – In-Depth Testing and Expert Review



If there’s a trend in the web scraping industry for this 2023 is the Unblocker’s one. Almost every month a new unblocker is released and, while this makes life easier for professionals, it’s also true that choosing the right one for your case it’s becoming more and more difficult. For this month’s episode of the Hands-on series, we’re testing the Oxylabs Web Unblocker.

What is the Oxylabs Web Unblocker

As other companies are doing in these months, Oxylabs launched its Web Unblocker, a super API that allows you, with a single URL to use as a proxy, to have also Javascript rendering, rotating IPs, and sessions management

Let’s see how powerful it is!

Oxylabs Web Unblocker features
Oxylabs Web Unblocker

Our testing methodology

As we did to the other “unblocker” API, we’ll use a plain Scrapy spider that retrieves 10 pages from 5 different websites, one per each anti-bot solution tested (Datadome, Cloudflare, Kasada, F5, PerimeterX). It returns the HTTP status code, a string from the page (needed to check if the page was loaded correctly), the website, and the anti-bot names.

The base scraper is unable to retrieve any record correctly, so the benchmark is 0.

As a result of the test, we’ll assign a score from 0 to 100, depending on how many URLs are retrieved correctly on two runs, one in a local environment and the other one from a server. A score of 100 means that the anti-bot was bypassed for every URL given in input in both tests, while our starting scraper has a score of 0 since it could not avoid an anti-bot for any of the records.

You can find the code of the test scraper in our GitHub repository open to all our readers.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


Preparing for the test

First of all, you need to create an account on Oxylabs’s website and then look for the Web Unblocker plans, which start from 15 to 11 USD per GB, depending on your needs. Following this link, you can have an extra 35% discount on the Unblocker and other Oxylabs services.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdee02c32 10d8 44f9 8b2d

After changing your password, you can have a look at the documentation with some examples of integration in different programming languages and also a usage dashboard. In my opinion, one of the best recap dashboards I’ve seen, because of a tiny detail you don’t see often.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F044ffe3f 1a54 4ed6 9791

You can see the recap of traffic usage by domain, very useful when using the same URL for multiple websites. I would add also the recap of the costs per website, but it’s already very helpful.

Setting up the Scrapy scraper

As said before, I’ve manually chosen fifty URLs, ten per website, as a benchmark and input for our Scrapy spider.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F35f94e18 182a 481e 9973

The scraper basically returns the Antibot and website names, given in input, the return code of the request, and a field populated with an XPath selector, to be sure that we entered the product page and were not blocked by some challenge.

There’s no particular configuration to apply to the scraper, only the call to a proxy in the settings.py file.

First run: no Oxylabs Web Unblocker

With this run, we’re setting the baseline, so we’re running a Scrapy spider without the site unblocker.

As expected, the results after the first run are the following.

First run results
First run results

Basically, every website returned errors except Nordstrom, which returned the code 200 but without showing the product we requested.

Second run: using the Web Unblocker with only raw HTML requested

As we have already seen in other similar products, we can run the scraper without enabling the Javascript rendering of the website, making it faster, or enabling it when needed.

In this second test, we’re not enabling it, and here are the results.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa679674b 1013 4bf8 8774

Surprisingly, we’ve got already a great result even without Javascript rendering, with almost all the requests responding successfully. Only a few ones made to Cloudflare and PerimeterX returned some errors but with some retries handled in the scraper, we can get the results also from them.

Third run: using Web Unblocker with Javascript rendering

To enable Javascript rendering, as written in the documentation, adding a custom header to the request will be enough.

yield Request(url, callback=self.test_url, meta={'website':website, 'antibot':antibot.strip()}, headers={'X-Oxylabs-Render': 'html'}, dont_filter=True)

In this case, it’s redundant since the unblocker passed already all the anti-bots challenges even without it but for the completeness of the results we make also these tests.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa673dbae d817 47a7 ba44

Of course, results are great also in this case, with only two URLs that need a retry to get the correct response.

Final remarks

The Oxylabs’ Web Unblocker is one of the newest super APIs that arrived on the market this year. Nowadays, we’re quite familiar with this kind of product, and the list of features is the usual one:

  • Embedded proxy rotation
  • Javascript rendering
  • Session handling
  • Proxy-like integration

What really matters for the end user then are two things: price and effectiveness. And in both of these aspects, the Oxylabs’ web unblocker is one of the best choices.

Pros

  • Very effective against all the tested anti-bot solutions
  • Competitive pricing: the smallest plan is priced at 15 USD per GB
  • Effective dashboard, with traffic split per domain
  • Javascript rendering included in the basic price
  • Free trial

Cons

  • Nothing relevant

Rating

As we have seen, Oxylabs’ Web Unblocker performed greatly, with only a few hiccups on some URLs. Given the fact that it resolved at the first try 96 URLs out of 100 its final score is 96\100If you want to test it by yourself and have a 35% discount on Unblocker and other services, you can follow this link.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.