I was given a specific task at work to analyze some credit card data. After spending a few hours on the code and trying to process the data, I realized that the numbers didn’t match with the universal truth accepted around the company. Hence, if I had presented what I have, all the credibility would have gone out of the window in the first couple of slides.
The reason is that I jumped into retrieving this specific dataset too early, eagerly and ignorantly. There are many nuances and things that I still need to learn about the data and logics. What I should have done is to get a foundation data with very few criteria, verify it to make sure it is correct and work my way from that foundation down to a smaller subset by adding one criterion at a time.
If you pull a report from an established source like WSJ or cite an academic article published in a journal, their credibility helps yours. However, when you pull the data yourself and present insights mined from such data, ensuring that the data is accurate is paramount to the success of the analysis. One mishap shreds your credibility and trust in your analysis. That’s the hard part, or at least one of the hard parts of working with data.
I should have done better today, but I learned a lesson, a lesson that I hope will serve me well and that we won’t have to meet again.