Instagram is estimated to have 1.074 billion users worldwide in 2021 (eMarketer, 2020) and the latest Instagram statistics show that an average Instagram post contains as many as 10.7 hashtags. With Instagram becoming increasingly popular, an estimated 71% of US businesses claim that they use Instagram for business (Mention, 2018). The same study also reveals that 7 out of 10 hashtags on Instagram are branded.
More than 80% of businesses consider Instagram engagement as the most important metric. Instagram engagement may be one of the most important KPIs for many marketers, but it’s not a straightforward task to measure it.
For all these reasons, it’s crucial for your business to be able to analyze your own Instagram data and the data of your competitors. To do so we will develop our own Instagram scraper, we will enrich the data with an AI image recognition API that will enable us to tag Celebrities in the pictures and we will build a Tableau Dashboard to mine the data.
We will analyze the @chanelofficial Instagram account as a case study for this blog article.
Step 1: The Scraping with instagram-scraper
Disclaimer: While Instagram forbids any kind of crawling, scraping, or caching content from Instagram it is not regulated by law. Meaning, if you scrape data from Instagram you may get your account banned, but there are no legal repercussions.
For educational purposes, I decided to scrape the data myself but you are free to use the scraping API of your choice, a few very good ones are also available here https://rapidapi.com/.
I am using instagram-scraper. Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user’s photos and videos. Here is the link to the github . You will find all the relevant documentation if you need more info.
Open a new terminal window and run the following command to install the instagram-scraper
$ pip install instagram-scraper
Once the installation is completed you just have to run the following command to scrape the profile of Chanel, with a limit of 1500 posts/videos.
instagram-scraper chanelofficial -u myprofilename -p mypassword -m 1500 --media-metadata --latest -t image video
You will have to replace myprofilename by your Instagram profile name and mypassword by the password of your Instagram account.
After a few minutes, you should have a folder called chanelofficial that contains all the pictures and images and a JSON file that contains all the metadata info about the content you just scraped.
Step 2: Data Transformation in Parabola.io
Now we have gathered the data we need to transform it and to enrich it before being able to analyze it. To do so I will use a no-code tool called Parabola.io.
The first step is to Use a JSON file
The second step is Expand JSON
Great, now we have the data in columns and ready to be used.
In order to be able to use the tags columns we will use a step Split column and we will split it on every comma.
We will also transform the timestamp to a date format easier to read.
Now let’s extract the mention for the text column. To do so we will use a Step Use Regex:
We will use this regex to remove all the Hashtags. Then we will split the column using the @ symbol to extract the mentions.
Now let’s enrich our data with the Microsoft Azure Image Recognition API in order to get the Celebrities automatically recognized in the Instagram posts.
You will need to create an account on Microsoft Azure to get access to the Image Recognition API, you will receive a 200$ free credit.
Here are the parameters to the request
Feel free to have a look at the documentation and explore the different data enrichment available
In the body of the request you will have to set up the variable “display_url” which is the link to the image of the post that will be analyzed.
Finally, don’t forget to set up your API key in the header to authorize the request.
That’s it, your data is ready, the final step is to export your data to your database. Parabola offers to send the data to a lot of different formats, it could be to Airtable, CSV, API, Dropbox, FTP, MYSQL, PostgreSQL.
For the sake of simplicity, I choose to export it to a Google Sheets that we will then connect to Tableau in order to analyze the data.
Step 3: Data Analysis in Tableau
I connected my google sheet to Tableau and I created a few visualizations to get some insights about Chanel, let’s have a look at it!
We can see that is a decline in the number of posts over time. But that does not necessarily mean less engagement, here I am calculating the number of likes per post to get a normalized view of the engagement of Chanel’s followers.
Next, let’s take a look at the hashtags that are generating the biggest amount of likes:
The most liked hashtag is ChanelFineJewelry.
Now let’s have a look at the Celebrities identified by our Image Recognition AI.
The Celebrity generating the more likes is Lily-Rose Melody Depp, followed by Kristen Stewart and Charlotte Casiraghi.
Finally, I’ve built a dashboard that enables to deep dive into the data by Celebrities, Tags, or Mentions. You can have a look at the data aggregated by the selected dimensions and deep dive into the post itself. It’s very powerful to identify the best-performing posts, celebrities, tags, and mentions and then replicate it.
That’s it for today, you’ve built your own Instagram analysis tool. You are ready to collect the data, analyze it and adapt your marketing strategy based on these insights.