Hands-On #4: Smartproxy Site Unblocker – A Detailed Review & Analysis



During the past weeks, Smartproxy added to the available services also a brand new API, called Site Unblocker, which promises to make our life easier when it comes to web scraping. Let’s see together how it works and if it passes our anti-bot tests.

What is Site Unblocker

Smartproxy, following the latest trend in the industry, launched its “super API”, called Site Unblocker. Basically, with only one API call, you have in your scrapers:

  • proxy management and rotation
  • geo-targeting options
  • javascript rendering
  • fingerprint management
  • sessions management

Let’s stress it to find out its potential!

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0fc9b9f 3c0e 427d a0b8
SmartproxySite Unblocker

Our testing methodology

As we did to the other “unblocker” API, we’ll use a plain Scrapy spider that retrieves 10 pages from 5 different websites, one per each anti-bot solution tested (Datadome, Cloudflare, Kasada, F5, PerimeterX). It returns the HTTP status code, a string from the page (needed to check if the page was loaded correctly), the website, and the anti-bot names.

The base scraper is unable to retrieve any record correctly, so the benchmark is 0.

As a result of the test, we’ll assign a score from 0 to 100, depending on how many URLs are retrieved correctly on two runs, one in a local environment and the other one from a server. A score of 100 means that the anti-bot was bypassed for every URL given in input in both tests, while our starting scraper has a score of 0 since it could not avoid an anti-bot for any of the records.

You can find the code of the test scraper in our GitHub repository open to all our readers.

This post is sponsored by Smartproxy, the premium proxy and web scraping infrastructure focused on the best price, ease of use, and performance.

Smartproxy
Smartproxy

In this case, for all The Web Scraping Club Readers, using the discount code WEBSCRAPINGCLUB10 you can save 10% OFF for every purchase.

Preparing for the test

First of all, you need to create an account on Smartproxy’s website and then look for the Site Unblocker plans. As mentioned before, since this is the Smartproxy Month, you’ll get a 50% off discount on every plan using the code SPECIALCLUB.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae99122 dc33 47f3 97ec
Smartproxy Site Unblocker menu

After the plan is activated, you can start playing around on the website to test some URLs manually.

Clicking on the “Advanced parameters” you have more details and options to customize your request and see, at the bottom, how the request changes accordingly.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9f3022 0934 430f 94ff

Setting up the Scrapy scraper

As said before, I’ve manually chosen fifty URLs, ten per website, as a benchmark and input for our Scrapy spider.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F35f94e18 182a 481e 9973

The scraper basically returns the Antibot and website names, given in input, the return code of the request, and a field populated with an Xpath selector, to be sure that we entered the product page and were not blocked by some challenge.

There’s no particular configuration to apply to the scraper, only the call to a proxy in the settings.py file.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


You can find anyway the full code of the scraper on the free GitHub repository of The Web Scraping Club.

First run: no Smartproxy’s Site Unblocker

With this run, we’re setting the baseline, so we’re running a Scrapy spider without the site unblocker.

As expected, the results after the first run are the following.

First run results
First run results

Basically, every website returned errors except Nordstrom, which returned the code 200 but without showing the product we requested.

Second run: using the Site Unblocker with only raw HTML requested

In this run, we’re trying the Site Unblocker without any additional headers to see if this is enough to bypass all the anti-bot challenges.

Having made all the setup in the settings.py file, the scraper is pretty basic.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F531213d7 7c54 458e afd7

Here are the results.

While the solution works at 100% against Datadome, Cloudflare, and PerimeterX, we have mixed results against F5, but it’s something that can be solved with some retries, while Kasada seems that cannot be bypassed.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F803777d8 de14 42b0 969b
Second run results

Third run: using Site Unblocker with Javascript rendering

To enable the browser rendering, the only thing to do is to review the requests parameter as follows.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F282e12d4 3375 4772 bf4b

Here are the results, similar to the previous run.

https%3A%2F%2Fsubstack post media.s3.amazonaws.com%2Fpublic%2Fimages%2F3823834a e50a 4d9c bac4
Third run results

Final remarks

Smartproxy’s site unblocker has just been released and has margins for improvements but it’s already performing well. Except for Kasada, all major anti-bots are easily bypassed with this solution

Pros

  • Integration with Scrapy is straight-forward
  • 100% success rate versus most famous anti-bots
  • Tweaking headers you can use advanced options like sticky sessions, geotargeting of the IPs, or custom cookies.
  • The price of 12$ per GB is one of the lowest for this kind of solution

Cons

  • Kasada actually not supported
  • There’s no pay-per-usage plan but only subscriptions, even if they start from really low, like 28$ per month for two GB.

Rating

The Smartproxy Site Unblocker is a brand new solution and the sins of its youth could be forgiven. It’s not that easy to bypass all the most known anti-bot solutions right from the start and the tweaks for scraping Kasada could come in the next months.

For today, its score is 80/100, a good result! In case you’re interested, you can try it on Smartproxy’s website and, using the code SPECIALCLUB you can get a 50% off discount.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.