Web Scraping and Automatic data refresh using Google Sheets and Tableau Public

Recently I have figured out how to do web scraping using google sheets and to automatically refresh a dashboard on tableau public. So here it is…just sharing what I have learned.

Tools I have used

  • Tableau Public Desktop.
  • Google Sheets.

At first, you have to find out the live data source and import that data to google sheets.
In my case, I have fetched the live air pollution data from https://data.gov.in/catalog/real-time-air-quality-index and next step is to import that data to google sheets. Input is an XML file and was available using API and I have got the access by registering and generating an API key.


Next step open your google sheets and import the data using IMPORTXML function available in google sheets.


 Format : IMPORTXML("URL","XPath")
URL : link to your data source
XPath : used to navigate through elements and attribute in XML (https://www.w3schools.com/xml/xpath_intro.asp).

Same way there are different functions available in google sheets to do web scraping.Check the link https://support.google.com/docs/answer/3093342 to know more about how to import data in google sheets from various sources. Once the data is imported in google sheet then next step is to enable the auto-refresh feature and to make sure that auto refresh works even in offline mode(data refresh work even without opening google sheets).

Go to File > Spread sheet settings and under Calculation tab change the Recalculation option to on change and every hour.
Next, go to Tools > Script editor and paste the below code and create a trigger to schedule the function once in every hour.

function myFunction() {

var ab = SpreadsheetApp.openById("1uU-Tru5eWh2AybPfxg0I268SAOscTjOWcDPhjGOQVVQ");
var sheet = ab.getSheetByName("Sheet1");
var cell = sheet.getRange('M1')

cell.setValue(1)
}

The above code first opens the google sheet by using ID, sheet id can be found out using the google sheet URL.
And then set the value of cells M1 to 1, we are doing this step to make sure that there is a calculation change and which will then trigger the IMPORTXML function to auto update even when the sheet is not opened.

Now it's time to open Tableau Public desktop and connect to the google sheets directly, create an awesome visualization 😃
and once it is ready to go to File >Save to Tableau Public and then tableau public login window opens and immediately you will get the below pop-up.

Don't forget to enable the Keep my data in sync option for Tableau to refresh the data from source once every day. You are all done now it's time for you to test it.👍

Comments

Contact Form

Name

Email *

Message *

Popular posts from this blog

All you need to know about the Google Data Studio

How to fetch data from SnowFlake using Excel VBA?

KPI's in Tableau with good or bad indicator.