Decoding the Kallax Index: Insights into Scraping IKEA

Scraping Ikea website tracking a product price globally

Today we’ll see what it means to scrape a popular e-commerce website in different countries and what we can get in terms of insights by doing this. For this task, we will get the data from the popular furniture retailer IKEA, with its physical stores available in many countries.

The Kallax Index

If you’re at least a bit interested in economics, you’ve surely heard about the Big Mac Index by The Economist.

Invented in 1986, it’s a simplified way to understand if currencies have a “fair” exchange rate, using the theory of the purchasing-power-parity: in the long term, a Big Mac should cost everywhere the same.

As an example, if a Big Mac costs 1 dollar in the US and 4 Yuan in China, the expected currency exchange is 1:4, but if on the markets is 1:6 it means that the Yuan is undervalued.

But what it’s true for a Big Mac, isn’t true for most of the retail world. Prices for the same item vary considerably from country to country, depending on the production site location, the logistics costs to the retail point and then to the final customer, taxation and import\export duties, and currency exchanges.

Let’s take as an example another global brand like IKEA, with stores in 61 countries of the world. While products are designed mainly in Sweden, they are manufactured mainly in China, South Asia, and eastern Europe, and then they need to be transported to all the retail locations around the globe, determining the final costs.

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.

To measure how big is this impact and do some web scraping at the same time, let’s take one of IKEA’s bestsellers, available in every country: the Kallax bookshelf.

Scraping Ikea Kallax
Decoding the Kallax Index: Insights into Scraping IKEA 9

Mapping the sites

Our first task (and the most boring one) is to map all the versions of the Ikea websites available for each country. Unluckily there isn’t a page with the list of all the country localizations of the website. We need to manually check for each one how the website works and if there’s the Kallax in the catalog.

And then the doors of the multi-country retail hell opened.

Hell of product codes
Decoding the Kallax Index: Insights into Scraping IKEA 10

Different product codes

To be sure to compare apples with apples, the Kallax model we’re looking for in each website is the white one, 77x77cm, that on the website has the product code 20275814 (or 202.758.14 with the punctuation format).

d7b08a9b b8d4 44a9 b496 cf363d46eac5 1889x856
Decoding the Kallax Index: Insights into Scraping IKEA 11

Once decided on the product to look for and found this code, we should be able to easily find it in every country.

Well, actually almost every country: I’ve soon discovered that in Japan and other eastern countries, there’s no product with the code 20275814, but the exact same Kallax variation exists and has 70351886 as a code.

302033db a71f 4cc1 afde f681210e7fcd 1844x925
Decoding the Kallax Index: Insights into Scraping IKEA 12

Luckily there are only these 2 product variations and they are exclusive. There’s no product 70351886 in the US and no other 20275814 product in Japan, so without worrying about picking the wrong item, we can programmatically look for both codes in every country to be sure we’re not missing anything.

China also has a third code, 90471717, and we’ll use it only for its website.

Different website versions

Once understood which product codes to ask for, I had a look at how websites in each country work, to get the price information we need most efficiently.

For most countries, a simple API call is enough. Using the internal product search API, found using the inspect tab of the network tools of Chrome, we can get easily the product price, its currency, and some other data we actually don’t need.

But there are 14 countries out of 61 where this is not true: this is because a different tech stack has been used for the website implementation and there’s not the same API available.

  • Bulgaria, Cyprus, and Greece use one variant of the website, without API

  • Hong Kong, Taiwan, and Indonesia use a second variant, calling a post request to an Algolia endpoint to gather data.

  • Estonia, Latvia, Lithuania, and Iceland use a third variant

  • Puerto Rico and Santo Domingo use a fourth variant

  • Turkey and China have each their own website version

In my experience with scraping e-commerces, this is a quite common situation and it depends on several causes. For greater China, Japan, and occasionally South Korea too, retailers usually assign the development of e-commerce to local teams that have greater expertise in UX, culture, and understanding of the society rather than a Western one. 

There are countries also where e-commerce has opened earlier and they are lagging behind on the software version, compared to the newest. Or on the contrary, the most recent e-commerce version has a new stack and it’s tested only on several countries before being ported to others.

In our case, since we’re not focusing on extracting all the data from all the countries but we need only one product per website, we’ll create the scraper for the most common version and then manually fill the gaps for the other countries.

Creating the scraper

The scraper itself it’s quite basic, we need to call the search API endpoint for 46 countries.

The endpoint is the following: 

and it returns a JSON with product details and its price. To rotate this request between the different countries we need the 2-digits country code and the related locale.

I’ll create then a file to use an input for these two values, one row per country. Then, inside the scraper, we’ll query both the product code we’re looking for and parse the JSON response.

I’ve developed the scraper using Scrapy but it could be done probably with some simple Curl requests, being basic in its features and you can find the code on our GitHub repository for free readers.

Visualizing the outcome

I’ve copied the result into an open Google Spreadsheet and integrated the results with the data from the missing countries.

We were lucky enough that the API response contained already the currency ISO code, otherwise another step of data quality and standardization would be needed before being able to compare the results.

Google Spreadsheet contains the formula for the currency exchange but usually, this is a data enrichment activity done in the database after data is scraped and loaded.

Thanks to both these elements, we’re able to get some insights immediately and there’s one thing that’s absolutely stunning. The same item could cost from 25 to 130 USD around the globe.

Despite the markup applied on local prices, needed for covering logistics and taxation costs, such gaps are usually a symptom of something to adjust. Generally speaking, it can be that a currency devaluated or appreciated too much and the local price has not been adjusted in time.

Final remarks

In this post, we’ve seen what it means to scrape prices from a multi-country perspective.

Technically speaking, it introduces a new order of difficulties:

  • Several variants for the same website

  • Different standards between the different websites (like the product code)

  • Data quality and standardization needed to compare apples with apples

From the business side, we have seen the differences in prices for only one product on one website in 60 countries. But in my working experience at Re Analytics I’ve seen how important is for our customers to have a look at this pricing data for hundreds of websites and millions of products. With a bigger picture, managers can decide with much more data and rely less on gut feeling.

If you’re into pricing themes and you’re always looking for new data, maybe you can be interested in our new project, It’s free and still in private beta, but feel free to sign up.

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.