Interview #8: Conversations with Fabiano Sileo on Web Scraping

Fabiano Sileo
Fabiano Sileo

Hi Fabiano, thanks for joining us at The Web Scraping Club, I’m really happy to have you here, since I’m sure we’ll touch on some points we usually don’t cover in this newsletter.

Before going on, please tell us a bit about you and your career, both in companies and as a content creator for professionals interested in Business Intelligence.

Hi Pierluigi, thank you for this interview, I am a keen subscriber to this newsletter.

I am starting my career as a Business Intelligence consultant in a small consulting company a sort of boutique working with big consulting companies (such as Deloitte, KPMG, Reply, etc.).

I worked in particular on SAP BI tool (SAP BW) and when I worked in this role I realized that data is an asset for all companies but only if it is analyzed and used well

I worked as a consultant for about 6 years but I realized that I was missing something because I only saw the data from a technical point of view, I mean it was important to me that the KPI was correct regardless of whether it was worth 10, 100 or 1000. I was missing the part of analyzing the data in order to make decisions (or help to do so), which in my opinion is the real reason why data should be analyzed

That is why 2 years ago I joined the financial planning and reporting team at Verisure with the ambition to help the team with data strategy and at the same time use data in a better way and see the effects of these data-driven analyses

In the meantime, I always liked to teach what I knew or learned and therefore started a blog that contained very technical guides. But then I realized how important it is not only to talk “to the technical people” but also to spread the word and so I started to create content on Linkedin and on my podcast with the aim of narrating a journey that starts from the digitization of companies, passes through business intelligence and finally arrives at predictive analytics and artificial intelligence

This post is sponsored by Proxyempire, your trusted proxy partner. Sponsorships help keep The Web Scraping Club Free and it’s a way to give back to the readers some value.

https%3A%2F%2Fsubstack post e780 422f 8f6b

In this case, for all The Web Scraping Club Readers, using the discount code TWSC10 you can save 10% OFF for every purchase.

I started my career in Business Intelligence too several years ago and still cannot find a definition of it that can describe it in a proper manner. What’s the Business Intelligence for you and why is important for any company?

What a beautiful and complex question. Let’s say that in 60 podcast episodes and a few hundred contents on Linkedin, I have only begun to answer it, but I will try to summarise my idea in a few lines here

Let’s start with the assumption that all companies have data and so if they all have it that’s not enough to say that data has value. We often hear, however, that data is the new oil

So there must be something that turns a commodity (the data that all companies have) into value, indeed into a real asset.

That something is the ability to use data to make better decisions. But in order to make data-driven decisions, there needs to be work that turns strings of code stored in DB’s and without value (the raw data) into information and knowledge

This something is business intelligence which aims to:

  1. relate data to each other by turning them into information (invoice data has no value, but knowing which customers are the best by analyzing the invoice history has a lot of value)
  2. eliminate all the noise created in point 1, leaving only the information of real interest to those who have to make the decision.
  3. Business intelligence describes how a company is performing, traditionally relying on internal data, and this is a must-have for any company. On top, with the economy that is rapidly digitalizing and becoming measurable from the web, a huge opportunity comes from web data, to access data never seen before. In your experience, have you seen many companies in Italy using web data?

To be honest, I believe that it is a possibility that is still little used, since in (especially) small and medium-sized companies, it is still difficult to approach ‘classic’ business intelligence, that which exploits internal data (due to various problems such as low data literacy, data silos, etc.). I believe that web scraping can represent a huge opportunity for improvement because the information extracted from the company’s systems can only give you information relating to the company’s interior but not to the context and the market in which it moves. Let me give an example to explain this concept. Imagine you are an estate agency. You can use Bi (and possibly machine learning) to monitor your results through specific KPIs such as the average time it takes to sell a house, the average margin, etc. But if the context changes (market decline or demand explosion), we cannot see it from this data because if we see a change in the values of the KPIs we are analyzing it could be dictated by a difference in business performance or context. If, on the other hand, you had the data of the target market, and therefore all the house advertisements on the internet in the city, you could count the number of them over time, monitor the average time spent on the internet before the sale was made, and thus have two very important types of information:

  • The general context in which you move
  • A benchmark against which to compare your performance

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.

Do you find that data projects are still hard to sell? When I started as a consultant in Business Intelligence, it was difficult for companies to understand why they should start these projects. Unlike marketing campaigns, where you spend 100$ and you can measure its return, for data projects the ROI is not so measurable, especially in web data projects that are still not mainstream and widely adopted.

You ask me a complex question for several reasons, the main one being that I am probably ill-qualified to answer it, in the sense that I personally do not sell data projects. However, I can give you my point of view. In order to ‘convince’ someone to invest in such a project you cannot only talk about ROI because, as you say, it is difficult to measure especially in the short term. You have to broaden the conversation by looking at:

  • Time saved
  • Errors avoided
  • Possible incremental gains
  • Improved knowledge

Linked to the previous question, probably the key to selling a data project is to highlight in the data visualization the key metrics that matter the most to the end users, with a clear and easy-to-get dashboard. Which tools are you using for it and what’s the process you use to create one?

The dataviz I believe is an indispensable tool of data literacy as it is the stage where we have the data, we remove the noise (i.e. irrelevant information), and show the decision maker only the information that is really needed to make the decision I use Powerbi a lot because I think it is the most comprehensive tool that allows you to do ETL, data modeling, write code and get beautiful visualizations The process I usually follow consists of:

  • Analyzing the phenomenon to be visualized
  • Make a list of the main metrics
  • Delete those that are useful but not fundamental
  • Draw a mockup (by hand or on powerpoint)
  • Hierarchize the metrics in order to understand how much emphasis to give to each one
  • Reason out what graph to use to tell the story those data tell (is it a train? Then I can choose between a bar chart or a line graph, for example. is it about showing the % impact of a possibility on the total? Then I can reason about bar charts, pie charts, but only if done well, etc.)
  • I implement the solution

The journey of data from sources, the web, or company databases, to the final users is long and has different steps in between. Do you have any tool and process to suggest for data cleaning, enriching, and monitoring the data pipeline?

It depends very much on the project you are implementing, for example, if we are talking about ML there are a number of MLOps tools you can use. In general, I think the DWH is still the main tool in which to do data storage and all the operations of data relations, data enrichment, etc.

Our usual last question. Any fun fact happened in the early days of your career?

I vividly remember the early days of working as a business intelligence consultant because of a funny fact During university I hated programming and the Python exam was a nightmare (because of an unengaging professor and my mentality of wanting to finish exams in order to graduate) So I promised myself that I would never program in life. During my university years, my first child was born and so as soon as I graduated I had to find a ‘serious’ job and stop waiting tables. A BI company contacted me and I went to the interview having no idea what business intelligence was. I had read the definition on Wikipedia on my way up the stairs to the interview I get chosen for the internship and do on-the-job training. On the first day, I walk into the meeting room and see a bunch of ‘technicians’ commenting on an ABAP (SAP’s programming language) routine put up on a big screen to try and fix a bug. I remember thinking they were speaking a language unknown to me and I would have wanted to run away, good thing I didn’t and gave myself the chance to discover and fall in love with data (and after all, code can be a great friend too)

Liked the article? Subscribe for free to The Web Scraping Club to receive twice a week a new one in your inbox.