The Lab #6: Bypassing Bans – Modifying Scrapy Ciphers Against TLS Fingerprinting



In other words: fake it until you scrape it


As you surely know, the most advanced anti-bot solutions act on different levels:

  • at a behavioral level, they check how the scraper act and try to distinguish a bot from a human.

  • at a browser level, they try to distinguish a genuine browser from an automated version, looking for some incongruence in the setup.

  • at an HTTP level, they try to identify the device configuration to detect suspicious setups.

On our Discord server the focus was on this latest case, so today we’ll try to explain how this can be achieved via TLS Fingerprinting and what can we do as a counter-measure in our scrapers.

Understanding TLS Fingerprinting

TLS fingerprinting is a passive (or server-side) fingerprinting technique used by servers to identify the configuration of the clients connecting to it.

The fingerprints are created using the ciphers exchanged when the connection between the client and servers establishes.

To better understand how this technique works, let’s borrow the image from this Cloudflare blog post.

Changing Ciphers in Scrapy
The Lab #6: Bypassing Bans – Modifying Scrapy Ciphers Against TLS Fingerprinting 5

When we connect a client to a server, the first interaction is made by the TCP protocol. It’s called Three-way Handshake, where the client and server share their willingness and availability to connect.

  • The client sends a SYN packet to ask for availability to the server for a new connection.

  • If the server is available, it replies with an SYN/ACK packet to the client.

  • The client again replies then with an ACK packet and the connection is established. From now on, the two can exchange data.

Without entering too many details about the full TLS protocol, we’ll focus now on what happens after a connection is established.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


The “Hello Message”, the first one sent by the client after the handshake, is where data needed for fingerprinting are sent. The message will include which TLS version the client supports, the cipher suites supported, and a string of random bytes known as the “client random.”

But the point is that ciphers differ from client to client: a Chrome connection has a different cipher suite than a Safari one or a Scrapy one, sent from the same machine.

Here are the ciphers of a connection made to google.com with Chrome from a Mac laptop.

	[8A8A]	Unrecognized cipher - See https://www.iana.org/assignments/tls-parameters/
	[1301]	TLS_AES_128_GCM_SHA256
	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA

Safari:

	[2A2A]	Unrecognized cipher - See https://www.iana.org/assignments/tls-parameters/
	[1301]	TLS_AES_128_GCM_SHA256
	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C00A]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
	[C009]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[C008]	TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA
	[C012]	TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA
	[000A]	SSL_RSA_WITH_3DES_EDE_SHA

Scrapy:

	[1302]	TLS_AES_256_GCM_SHA384
	[1303]	TLS_CHACHA20_POLY1305_SHA256
	[1301]	TLS_AES_128_GCM_SHA256
	[C02C]	TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
	[C030]	TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	[009F]	TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
	[CCA9]	TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
	[CCA8]	TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[CCAA]	TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256
	[C02B]	TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
	[C02F]	TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
	[009E]	TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
	[C024]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384
	[C028]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
	[006B]	TLS_DHE_RSA_WITH_AES_256_CBC_SHA256
	[C023]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
	[C027]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
	[0067]	TLS_DHE_RSA_WITH_AES_128_CBC_SHA256
	[C00A]	TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA
	[C014]	TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
	[0039]	TLS_DHE_RSA_WITH_AES_256_CBC_SHA
	[C009]	TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA
	[C013]	TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
	[0033]	TLS_DHE_RSA_WITH_AES_128_CBC_SHA
	[009D]	TLS_RSA_WITH_AES_256_GCM_SHA384
	[009C]	TLS_RSA_WITH_AES_128_GCM_SHA256
	[003D]	TLS_RSA_WITH_AES_256_CBC_SHA256
	[003C]	TLS_RSA_WITH_AES_128_CBC_SHA256
	[0035]	TLS_RSA_WITH_AES_256_CBC_SHA
	[002F]	TLS_RSA_WITH_AES_128_CBC_SHA
	[00FF]	TLS_EMPTY_RENEGOTIATION_INFO_SCSV

They all differ in order and number of ciphers. It means that the server, using these ciphers and some other parameters sent, has an idea of my client’s architecture as soon as I try to connect to it and can use this data to create fingerprints and block suspicious ones.

cc3ba78b fd98 493e 9dc4
The Lab #6: Bypassing Bans – Modifying Scrapy Ciphers Against TLS Fingerprinting 6

This great LWT Hiker blog post, from where the previous table comes, digs deeper in detail and shows also two of the most know algorithms to create fingerprints used nowadays, the JA3 and the TS1.

Countermeasures

The full article is available only to paying users of the newsletter.
You can read this and other The Lab paid articles after subscribing


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.



Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.