I love my country. I want to promote my country as much and as honestly as I can. Given my past experience in the hospitality industry, I am a bit drawn towards reading and writing about it. I really want to do some analysis on the arrivals to Vietnam, but the government body responsible for recording data makes it annoyingly challenging for me to work with the data.
Problem 1 – No excel files
First of all, there is no feature on the website to download data in an Excel file. You have to download data and put it yourself in an Excel file. On the other hand, the Singapore Tourism Board makes it super easy to store data on an annual basis as you can see below
Problem 2 – Inconsistent naming and order of entries
Copying data from an HTML table wouldn’t be so bad if the order of entries stayed the same across the tables. However, it isn’t the case. The order is all over the place as you can see below. Countries are mixed up differently from one month to another
Even that is the case, vlookup can still help overcome the challenge. However, vlookup requires consistency of variables’ names. In the screenshot above, Cambodia is spelled differently in April and March 2019 reports.
Problem 3 – Redundant variables’ names
Redundant variables’ names like in the screenshot above violate the integrity of data. If you use vlookup, the results will be redundant and inaccurate.
Given how they display the data online, I don’t have much faith that internally, things are different. My bet is that there is no data-centric approach and even if data is used, it must be a time-consuming, laborious and primitive endeavor.