The Lab #7: Decoding PerimeterX – Strategies for Successful Web Scraping



Is scraping PerimeterX websites as difficult as it seems?


What is PerimeterX?

PerimeterX is one of the most well-known anti-bot solutions, used by some of the top-tier websites on the net. They recently merged with Human Security, another company in the anti-bot industry but more focused on fraud prevention and account abuse.

How to detect PerimeterX anti-bot solution?

If we analyze the tech stack of the target website with Wappalyzer, PerimeterX appears in the security section with a good degree of precision. Detecting it by inspecting the network tab in the developer tools is pretty easy. When active, you will see that PerimeterX sets a cookie with the following format when loading the first page of the website.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


The Human Challenge

The Human Challenge is the PerimeterX “trademark” when talking about anti-bot challenges. Instead of throwing a Captcha or a Re-Captcha, their anti-bot solution shows this big button that a human must keep “pressed” with the mouse until the challenge is solved.

35f9421d f928 4205 aafa 41ef3ab94f62 1154x731 2
The Lab #7: Decoding PerimeterX – Strategies for Successful Web Scraping 3

In a 2020 interview published on the company website, Gad Bornstein, product manager at that time before going to Meta, explained that this peculiar solution has several advantages both for website users and owners.

It’s 5x faster for humans to solve compared to other solutions and this leads 10-15x lower abandonment rate with Human Challenge compared to reCAPTCHA,

Another interesting topic in this interview is how the solution works:

Bot Defender also works in real time, so every time a user gets a new page, we calculate their behavior, path, fingerprints and all those machine learning models. Then we get a score that defines whether you’re a human or a bot.

And if you are categorized as a bot, the Challenge triggers. This means that scrapers need to pretend to be like humans but also act like humans.

Real-world examples

Finding a test website for this article has not been easy, the websites I knew were using PerimeterX now paired it with also Cloudflare bot management which would have affected our tests.

We’ll use neimanmarcus.com as a target website but before starting coding, I’m sharing with you this good article about PerimeterX made by Zenrows. It lets you understand in detail how it works and it describes a hypothetical solution for reverse engineering its functioning.

In my opinion, despite being very interested in understanding what happens under the hood, I would not implement this kind of solution in my scrapers. The algorithm can change often and this requires restarting the reverse engineering process. The most durable method is instead trying to simulate being a real user, using as less resources as possible.

The full article is available only to paying users of the newsletter.
You can read this and other The Lab paid articles after subscribing


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.



Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.