Scraping E-commerce Websites 101: Extracting Valuable Data



How to approach the web scraping of e-commerces before starting coding.

In this post, we’ll leave for a moment the technical aspects of web scraping for a better understanding of what it means to scrape an e-commerce website and what aspects should be weighted when approaching such a project.

I will bring my experience of several years of scraping e-commerce websites but I’d be glad to hear more from you in the comments and in the polls inside this article.

How many types of data are inside an e-commerce website?

gray and blue Open signage
Scraping E-commerce Websites 101: Extracting Valuable Data 7

If I think of scraping an E-Commerce, I immediately think about prices, because that’s what I’ve scraped for years.

But this is only a small part of the data that we can extract from it and, depending on the industry where the target website is operating, some types of data are more important than others.

This post is sponsored by Smartproxy, the premium proxy and web scraping infrastructure focused on the best price, ease of use, and performance.

Smartproxy
Smartproxy

In this case, for all The Web Scraping Club Readers, using the discount code WEBSCRAPINGCLUB10 you can save 10% OFF for every purchase.

Here’s a list of what can be scraped from e-commerces (and I’m pretty sure it won’t be an exhaustive one):

  • prices and promotions, of course, to discover pricing strategies for brands and websites

  • product features, useful when you need to discover trends and patterns in new products coming or in the competitors’ offering

  • reviews, to understand the sentiment of the buyers

  • availability, for checking the inventories of sellers for products or their variants

  • positioning, to understand if a brand or a product is placed in a highlighted place on a website

  • distribution, how many brands are on a cohort of websites, and who isn’t there

Is it legal to scrape e-commerce websites?

Yes, as soon we respect the general rules for ethical web scraping.

  • Do not harm the target website’s functionality

  • Do not save copyrighted data or items

  • Do not scrape anything behind a log-in

  • Do not break the Terms of Services that are needed to be accepted explicitly (Clickwrap)

  • Use API whenever possible to reduce the weight on the target website

  • Do not interfere with the target website’s business and operations.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.


Some glossary

E-commerce websites in different industries have quite the same structure:

  • A product list page, usually shortened with PLP, where you have different products that match a certain filter. It’s like the bookshelf in a library.

Product list page example
Scraping E-commerce Websites 101: Extracting Valuable Data 8
  • A product detail page, usually shortened with PDP, where you have all the details of a single product, like a single book inside the library.

e539a215 93db 42e1 8840
Scraping E-commerce Websites 101: Extracting Valuable Data 9

Depending on the type of data you need to extract, the scraper could enter only the product list page or also the details.

Another classification used when having a look from a business perspective at e-commerce data, is the categorization of the websites by their distribution model.

  • There are directly the so-called monobrand websites, typically operated by the brands to sell their products, for example, lululemon.com. We have only one seller and one brand sold on this website.

  • There are then multibrand websites, like Footloker.com, where one seller offers multiple brands to its customer.

  • Then we have marketplaces, like Amazon.com, where multiple sellers sell multiple brands to the customers.

Each of these website types poses different challenges when you need to scrape them.

Monobrands have typically few items to scrape but they are highly localized in different regions. It means that you can expect a different website from Chanel in China or in Europe, for example.

Multibrands have more items to collect and sometimes have a strong anti-bot solution, to avoid fraud or discourage web scraping.

Marketplaces are usually huge to scrape and the main challenge is to split the execution in a wise way to get all the data needed without going broke and may have different product page layouts and structures depending on the seller.

Wrap up

Web scraping eCommerce websites can provide valuable insights for businesses and individuals alike.

However, it’s crucial to consider the peculiarities and challenges associated with scraping each type of eCommerce platform.

By understanding these nuances, you can use a more effective and efficient web scraping strategy, ultimately making the most of the data extracted from these websites.


Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.