Kasada’s Journey: From Parka Desire to “Error 429” Insights



Why we get 429 error when browsing some websites?

In this post of The Web Scraping Club, we’ll see why some websites when we load them for the first time, throw a 429 error before starting to work.

TLDR versions: It’s because of Kasada’s anti-bot solution.

So you want to buy a parka?

Let’s say you’re looking for a nice parka for the coming winter season, you go to your preferred e-commerce and from the developer tools, inside the network tab, you see a red line when loading the home page: error 429, too many requests. Kinda strange isn’t it?

32c7b9bb 9f40 49d9 bc07 ca1c7e5a4c38 896x673 3
Kasada’s Journey: From Parka Desire to “Error 429” Insights 19

Having a look at the network tab, an unusual header comes to my eyes

9a884147 8b57 4f7b b3cd 8547a98095ae 503x76 1
Kasada’s Journey: From Parka Desire to “Error 429” Insights 20

An invisible guest

This response is the typical response given by an anti-bot software called Kasada, one of the newest players in the field.

The main difference between Kasada and other software is stated on their website: “Kasada assumes all requests are guilty until proven innocent by inspecting traffic for automation with a process that’s invisible to humans.”

And again “Automated threats are detected on the first request, without ever letting requests into your infrastructure, including new bots never seen before. “

So every first request to the website is considered as if coming from a bot and only if the browser solves a sort of magic trick, then is redirected to the browser. Interesting.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


This is a shift in the paradigm of anti-bot software. 

Most of the other solutions follow two roads:

  • Check for red flags in the requests, like anomalies in the browser fingerprints or similar things

  • Monitor the behavior of the user and eventually block him if it seems suspect.

In this case, it is something like “block everything unless there’s evidence that the requests do not come from a bot”.

Blocking all bots: is it a good thing for SEO?

For sure it’s a strong statement and limitation: restricting bots could mean also limiting the SEO of the website and the usage of legit tools used by the company. In the case of public e-commerce, it makes sense in some parts of the website, when adding to the cart some items as an example, but limiting the whole website can lead to awful results.

While the brand’s website performs as expected on Google and DuckDuckGo, being on top of the organic results when searched

2e7d5183 6f58 4a56 a9a0 9100a7332300 1110x848 1
Kasada’s Journey: From Parka Desire to “Error 429” Insights 21
63438e3d 4e92 4e48 8b27 d1f9919d6c95 1364x848 1
Kasada’s Journey: From Parka Desire to “Error 429” Insights 22

in Bing is only fourth, behind another brand with a similar name.

e26fc258 fb60 442d 9a5d bacb5c1e3082 1140x830
Kasada’s Journey: From Parka Desire to “Error 429” Insights 23

The most concerning thing is that on Baidu, the Chinese version of the website is completely missing from the first page, leaving one of the most important markets for luxury goods completely unserved.

160b6abe c8f3 4d52 92dc
Kasada’s Journey: From Parka Desire to “Error 429” Insights 24

Challenge accepted

Being a nerd passionate about web scraping, this website sounds like a challenge to me.

I know from my readings that Kasada is one of the toughest anti-bot on the market, due to its peculiarity described above, but I don’t give up without even trying. 

I should use the artillery for this battle and show off directly with the best weapon in my armory: Playwright + Stealth plugin using Chrome and not Chromium, to simulate a real person browsing the website… and I failed miserably.

42b8dd9d bfd8 4c94 8f9c e94bd4c5f070 1825x872
Kasada’s Journey: From Parka Desire to “Error 429” Insights 25

Tried and tried and tried again several changes of configurations and the result was always the same. A blank page.

The guys at Kasada really made a great job in blocking bots, I must admit it.

But just when I was going to give up and go crying in a dark corner of my room with my broken ego, came to my mind again the SEO issue of this website, and decided to have a look at its robots.txt file.

3d897bd5 0cf0 4614 8e9c
Kasada’s Journey: From Parka Desire to “Error 429” Insights 26

So, there’s this “Screaming Frog SEO Spider” allowed to roam all around the website and my Chrome setup is not?

Having a look at this Screaming Frog tool, it’s a sort of desktop application that allows the user to download the HTML file of some websites, using guess what? A headless version of chromium.

But being a desktop application, it is very unluckily that this rule is valid only for a certain range of IPs, so why we don’t try to change the User Agent accordingly to the Screaming Frog?

3c582fc8 b571 4915 9883 ee524a2b40a0 1260x788
Kasada’s Journey: From Parka Desire to “Error 429” Insights 27

Bingo!

This of course does not mean I’ve found a way to “hack” the Kasada anti-bot but, in this particular case, the solution is misconfigured.

It was probably too strict in blocking also some internal processes of the company so a back door was left open, giving the opportunity to the bot to enter.

Is fighting bots so aggressively worth the risk?

We don’t know many details: how many bots were attacking the website before Kasada was implemented, the costs of the solution, and the losses of sales caused by this SEO lack.

Kasada for sure is doing a great job, maybe was too good in this case that some backdoors were needed to not interfere with the operativity of the website.

Generally speaking, when it comes to fraud or DDos attacks, or other harmful activity, of course, bot fighting is a must. And it’s difficult to differentiate also between bots who just want to scrape some prices from the website the dangerous ones.

For sure if every e-commerce accepts to give access to its publicly available data (product catalog as an example) may be via an API, most of the bots that serve the market intelligence industry (like ours at databoutique.com) would stop scraping the website and anti-bot solutions can be targeted more efficiently against frauds and Ddos attacks. Maybe it would help, just my 2 cents.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.