March 2023: Key Takeaways from the Web Scraping World

New tech and ethics consortium in the web scraping industry

Welcome to the web scraping news recap for the month of March, let’s see what happened.

The Ethical Web Data Collection Initiative

Source: Official website

This month opened up with all the major players in the web scraping industry teaming up to form the Ethical Web Data Collection Initiative.

grayscale photography of boys raising hand
March 2023: Key Takeaways from the Web Scraping World 11

Its founding members (Zyte, Smartproxy, Rayobyte, and Oxylabs to name a few of them), are setting standard best practices and ethical rules for the web data extraction industry in several areas:

  • Legality: Web Data Collectors will only scrape and use tools to scrape data within the confines of legislation and regulation from relevant jurisdictions, and will strive to keep up to date with judicial cases that may affect the legal boundaries of the work.

  • Ethics: Companies engaged in collecting web data will follow a strict ethical framework, including ethical principles related to four main pillars: websites, customers, proxies and data.

  • Ecosystem management: Web Data Collectors are in a symbiotic relationship with the free and open internet ecosystem and strive to engage it collaboratively in an open and communicative manner

  • Social responsibility: Web data should never be used to the detriment of society, and members pledge to support and collaborate with civil society and governmental organizations for societal benefit.

If you want to know more about these principles and what they practically mean, you can download their doc “Ethical Web Data Collection Initiative (EWDCI) Principles 1.0”

This post is sponsored by Proxyempire, your trusted proxy partner. Sponsorships help keep The Web Scraping Club Free and it’s a way to give back to the readers some value.

https%3A%2F%2Fsubstack post e780 422f 8f6b

In this case, for all The Web Scraping Club Readers, using the discount code TWSC10 you can save 10% OFF for every purchase.

Tech updates

Apify new python SDK

Apify released its brand new Python SDK, allowing users to write scrapers and actors in Python.

This is great news for all the pythonistas like me and I will soon write a post about it. It’s an SDK compatible with all the major packages for web scraping like Requests or HTTPX, Beautiful Soup, Playwright, Selenium, or Scrapy.

If you’re new to Apify, it is a cloud platform that helps you build reliable web scrapers, fast, and automate anything you can do manually in a web browser.

Actors instead are serverless cloud programs running on the Apify platform that can easily crawl websites with millions of pages but also perform arbitrary computing jobs such as sending emails or data transformations

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.

To complete the circle, Actors can be monetized on the Apify marketplace, where you can publish them and make people actually pay for their usage.

Apify marketplace
March 2023: Key Takeaways from the Web Scraping World 12

Bright Data Scraping Browser

Some weeks ago also Bright Data released a new product called Scraping Browser: basically it is a managed browser session designed to bypass anti-bot challenges.

As you can read from the documentation, you can use it in your Puppeteer or Playwright projects and call the Scraping Browser just like a normal proxy, after you logged in to Bright Data’s website and create the proper zone.

As we all know, setting up the infrastructure for some scraping with Playwright can be painful, since we need to try which browser works against that particular website we’re working on and also which configuration for the chosen browser.

af53ce10 3ba1 404a a49e
March 2023: Key Takeaways from the Web Scraping World 13

With Bright Data’s scraping browser, we should be able to do not worry about the infrastructure and its setup but we could only call an API to solve our issues.

5a1b099b 0408 47d3 9795
March 2023: Key Takeaways from the Web Scraping World 14

Of course, it has a cost, as with everything in our life when we pay someone to fix some issue we don’t want to do by ourselves, and depending on your budget and time constraints it could be more convenient to make or buy.

Anyway, soon in this newsletter we’ll see how this tool will perform against the most famous anti-bot solutions, just like we did on the previous Hands-On article on the Web Unlocker.

Video of the month

As we have talked about monetizing your scrapers on the Apify Marketplace, here’s a great video that briefly covers all the aspects, from scraper creation to publishing, using no-code tools.

The most-read article of the month

The most successful article of the month is the first Hands-On article where I use Bright Data Web Unlocker to tackle the most famous anti-bot solutions.

This concluded my selection of posts and articles for March, if I’ve missed something important please let me know in the comments or in our Discord server.

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.