A real-world use case of a simple scraper that does not get blocked by Datadome
What is Datadome?
Datadome Bot protection is one of the key players in the anti-bot software industry.
As stated on their website, the solution is used on several important websites such as FootLocker, Rakuten, and even Reddit, as you can see from the picture below.
So, sooner or later, in your life as a web scraper, you’ll surely face one website protected with this technology.
These days I needed to update a scraper that eluded Datadome so it’s a good time for writing the process that allowed me to scrape the data from this website.
Are you looking for a Birkin?
In case you know what a Birkin is, you probably understood that the website in question is Hermes.com. For the others, a Birkin is one of the most iconic bags crafted by the Maison Hermes and it costs like a supercar (and no, it’s not sold online anyway).
From a quick analysis of the network tab of the browser, we can see that by browsing the products in every category, we call an internal API that shows the product details we need.
Let’s start with the basic stuff
Let’s start with our standard Scrapy spider and see if we can get inside the website, after some make-up to our DEFAULT_REQUEST_HEADERS property.
And soon we can see Datadome at work. We got redirected and locked out of the website, while on the browser the redirect leads to the Home page of the website.
A different path.
Hermes does not have an official App but I can use the same procedure explained in the first post of THE LAB to see how the website behaves when accessed by mobile.
The full article is available only to paying users of the newsletter.
You can read this and other The Lab paid articles after subscribing
Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.