Stats on retail store size & revenue per square feet

I have been doing some industry research for work, specifically on the retail industry. One trend that CBInsights mentioned in their report was that retail stores were shrinking in size. CBInsights argued that retailers wanted to more conscious of how they made use of their retail space. The competition is so fierce that retailers cannot afford to do everything, be everything and sell everything. They tend to get more nimble in operations and conscious of what they have on display. Nonetheless, CBInsights’ latest year in their data was 2015. So I went through the financial reports by several retailers to find out if retail stores are actually shrinking in size. Before I go through the findings, below are a few important notes:

  • The list of retailers was from this article by WSJ. There are several retailers whose information was not retrieved. The omission was attributed to the way such retailers structured their data, making it time-consuming to retrieve data and complicated to explain. Hence, I decided to omit those retailers
  • Retail is a complex industry. The data is for reference only and may represent to some extent the players or trend in the industry. By no means do I believe that the data represents 100% the retail industry
  • Data from 2015 to 2017 was from the chosen retailers’ annual reports. Data in 2018 was from the latest quarterly reports. Only Walmart already filed their 2018 annual report
  • Apart from Walmart and Sam’s Club, no other companies had their revenue data retrieved. It doesn’t make sense to analyze revenue per square feet with only 3 quarters’ data recorded
  • Data is for the retailers’ US segment only
  • Revenue by Sam’s Club excludes fuel revenue

Number of stores

Among the surveyed companies, only Best Buy, JC Penny and Sam’s Club lowered their store count

Average Store Size

Best Buy, JC Penny and Sam’s Club increased their average store size. The rest decreased theirs, except Dick’s, which keeps their store size more or less the same for the past 4 years

Revenue per square feet

Only Target saw their revenue per square feet decrease in 2017, compared to 2016 and 2015. As the chart can show, 2018 looks to be a good year for Walmart. Both Walmart (the brand) and Sam’s Club increased revenue per square feet, especially the latter.

Summary

The majority of the surveyed companies reduced their total retail space, but managed to make the most of their selling space. This is in line with what CBInsights mentioned (I touched on it above as well).

The data I collected is available here on my Tableau profile

Data Visualization: Hotel Statistics from Singapore Tourism Board

The data from this Story is from 2014 to October 2018 and taken from Singapore Tourism Board website. You can find my Tableau dashboard here.

Accumulative figures for available room nights, booked room nights, revenue, average room rates and RevPar are on a monthly and yearly basis.

Drilling down further, hotel statistics are categorized by four hotel tiers: luxury, upscale, mid-tier and economy. Since segment data is available only in RevPar, average room rates and occupancy rate, it’s not possible to compute the accumulative figures for segments.

Basic terminologies:

Available room nights refer to the total possible number of room nights in a period of time.

Booked room nights refer to the total number of room nights booked by guests in a period of time.

Average room rates refer to the average revenue per booked room nights in a period of time. It’s calculated by revenue divided by the number of booked room nights

Occupancy rates refer to the percentage of booked room nights against the available room nights. It’s calculated by booked room nights divided by available room nights

Revenue per Available Room or RevPar refers to the revenue per a room night. It’s one of the most important metrics in the industry, showing how a property fills its room inventory and how much it can charge per room. It’s calculated by Occupancy Rate multiplied by Average Room Rates in the same period of time.

Insights

Yearly Statistics

From 2015 to October 2018, the Singaporean tourism industry averages more than 80% in occupancy rate; which is pretty positive

The industry seems also to be on expansion with increasingly bigger yearly available room nights (almost 3 million room nights from 2014 to 2017) & booked room nights


While the number of rooms and occupancy rate get increasingly bigger year after year, the average room rate and RevPar go in the opposite direction

The rise in revenue signals that the increase in available & booked room nights outweighs the drop in average room rates and RevPar


Data in 2018 doesn’t include figures in November and December. Given the past performances, it’s difficult and uncertain to predict how the average room rates and RevPar will fare. It seems; however, average room rate will go up in December, compared to the rate in November.

Also, December seems to be a low month for Singapore’s tourism as historical figures show below. On the other hand, July and August are the busiest months of the year for Singapore.

Regression Model

Regression models every year show that there seems to be a correlation between occupancy rate and RevPar. Every year’s model is accompanied with a P-value less than 0.05 and an R-Squared between 58% and 77%

Seasonality

Regarding seasonality of Singapore’s tourism industry, every September seems to take a sharp decline in terms of occupancy rate, compared to the same year’s July and August. It’s an interesting phenomenon given that the Singapore Formula 1 Grand Prix takes place every September. Surprisingly, luxury and upscale segments tend to have fared better than mid-tier and economy segments from 2014 to 2016 and in 2018. My theory is that September may be a traditionally low month for Singapore and as a result, sees a low occupancy rate. However, hotels either push prices alone higher or package room rates with Formula 1 weekend tickets. The average room rate; therefore, may be driven up higher despite a low occupancy rate.

In 2014

In 2015

In 2016

In 2017

In 2018

It’s very interesting since if the demand is not there, businesses don’t usually drive up the prices. As a matter of fact, prices are usually lowered to stimulate the demand. My theory for such a phenomenon is cited above with regards to the Formula 1 Grand Prix in Singapore.

Luxury Segment

Upscale Segment

Mid-Tier Segment

Economy Segment

Data Visualization: Artificial Intelligence Landscape in 2018

This is a little personal project, using Tableau to visualize the data from Artificial Intelligence Landscape in 2018 by Asgard. All copyrights related to the data are owned by Asgard. I just applied some simple analytics and visualization.

I tried to embed the code from Tableau, but I’d have to upgrade my WordPress plan to Business in order to do so. A bit of a bummer. Nonetheless, here are a few insights

Above is a diagram of the top 10 industries by the number of AI companies. Communications, Sales & Marketing and Healthcare make up the top 3.

On a global level, the US dwarfs every other country in terms of the number of AI companies. It has more AI companies than the next 6 countries combined.

Above are the top 10 cities by the number of AI companies. San Francisco, London and New York make up the top 3. The numbers by city on my dashboard differ from those presented by Asgard itself. The difference, I suspect, comes from the way cities are grouped. I didn’t do any grouping. All visualizations come straight from the data itself while I suspect that there must have been some grouping by Asgard. Nonetheless, the figures by country present no difference.

On my dashboard, users can click any country on the map to filter out the list of individual companies within that country. Furthermore, users can see the top 10 industries by country as well.

My Tableau dashboard can be found here.

Mapbox – Two ways to show a map

For the past few months, I have been charged with visualizing data onto maps using either Google Maps or Mapbox. I chose the latter. After days and nights of struggle, I am pretty close to the finishing line and have gained quite a bit of experience in Mapbox that I want to share. 

Long story short, to map data onto maps, you need to structure data into a specific structure called GEOJSON. It looks like this:

You can put anything in the “properties” key, but the rest essentially have to follow the above format. The coordinates will be used to locate markers or data on the map. 

Let’s say the data that I want to map has 6 different mission areas (see the screenshot above) and my job is to map them onto the map in 6 different colors. 

I have an array that contains 6 different mission areas like this and 6 different colors representing those areas

var missionarea = [area1, area2, area3, area4, area5, area6]
var colorcode = [color1, color2, color3, color4, color5, color6]

One layer approach

#create a map canvas

var map = new mapboxgl.Map({
    container: 'map',
    style: 'mapbox://styles/mapbox/light-v9',
    center: [-96.797885, 39.363438],
    // initial zoom
    zoom: 3.3
});

#load data and create the layer

map.on("load", function() {

    map.addSource('communityData', {   #name the data as communityData
        type: 'geojson',
        data: communityData, #this is the name of your GEOJSON
    });

    map.addLayer({      #add the layer
        "id": "commMap", #your single layer's name
        "type": "circle", #meaning that each item will be a dot
        "source": "communityData", #the name of your data assigned above
        'layout': {},
        'paint': {
            "circle-radius": 8,
            "circle-opacity": 1,
            "circle-color": {
                "property": "Mission Area",
                "type": 'categorical',
                "stops": [ #assign color to respective mission areas
                    [Missionarea[0], colorcode[0]],
                    [Missionarea[1], colorcode[1]],
                    [Missionarea[2], colorcode[2]],
                    [Missionarea[3], colorcode[3]],
                    [Missionarea[4], colorcode[4]],
                    [Missionarea[5], colorcode[5]],
                ]
            }
        }
    });
}

Multiple Layers

In this example, I’ll label each map layer as “show0”, “show1″…”show5”

var showlist = [] //the array of layerIDs, used to categorize layers
var base = "show"
for (var i = 0; i < Missionarea.length; i++) { //populate showlist array
    var text = base + i
    showlist.push(text)
}
var map = new mapboxgl.Map({
    container: 'map',
    style: 'mapbox://styles/mapbox/light-v9',
    center: [-95.957309, 41.276479],
    // initial zoom
    zoom: 6
});

map.on("load", function() {

    map.addSource('communityData', {
        type: 'geojson',
        data: communityData,
    });
    //*********************************** Load partners *****************************************************

    communityData.features.forEach(function(feature) {
        var primary = feature.properties["Mission Area"];
       // iterate through the mission area array
        for (var i = 0; i < missionarea.length; i++) {
            if (primary == missionarea[i]) {
      // assign a name to each layer
                layerID = showlist[i];
                if (!map.getLayer(layerID)) {
                    map.addLayer({
                        "id": layerID, #layer's name
                        "type": "circle",
                        "source": "communityData",
                        "paint": {
                            "circle-radius": 8,
                            "circle-opacity": 1,
                            "circle-color": colorcode[i], #color
                        },
                        "filter": ["all", ["==", "Mission Area", primary]]
                    })
                }
            }
        }
    });

Which approach is better?

If the intention is just to show the dots, there is no difference and it depends on personal preference. However, if your code gets more complicated and as in my case, I had to create at least 6 filters on the map, things will get messy and one approach will no longer allow you to do what you want. Unfortunately, I don’t have that much experience yet to tell you more and I personally believe it’s a case-by-case thing. 

Turn Excel into GEOJSON

My Capstone project requires me to turn Excel in GEOJSON for mapping purposes. Handling and preparing data is 50% of the whole process. I’d like to share what I did step by step, hoping that it will be useful to some who are learning the ropes like I am.

I am using Python as the programming language of choice and Pycharm as IDE. Create a folder on your computer and store the Excel file in question in it. Open the folder in Python and create a new Python file. Here is how it looks on my Pycharm

ExceltoGEOJSON_1

Before we move forward, it’s important to know what a GEOJSON is and how it looks. This website offers a great review on GEOJSON. In terms of structure, a GEOJSON file looks like this

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [0, 0]
      },
      "properties": {
        "name": "null island"
      }
    }
  ]
}

I am pretty sure we can have as many variables under “properties” as we want. The rest should be standard to be followed as possible.

This is how the Excel file looks. Notice that there are coordinates available already. In the future, I’ll work on geocoding an address into coordinates.

ExceltoGEOJSON_2

Let’s start working on the Python code.

import pandas as pd

df = pd.read_excel('CommunityPartner.xls')

Import the “pandas” package. Shorten the package’s name as pd because who would want to repeat a long name many times in the code?

The following line is to read the Excel file into a data frame called df. You can name it however you want. Since the Excel file and the Python code are in the same folder, there is no need to have a directory. Otherwise, it’s necessary to have a full directory.

collection = {'type': 'FeatureCollection', 'features': []}

The next step is to create a shell dictionary. Refer back to the sample structure of a GEOJSON file above to see why I structure the collection variable like that.

df['description'] = df[['Address', 'City', 'State', 'Zip']].apply(lambda x: ' '.join(x.astype(str)), axis=1)

Since we don’t have a full address, the above line is to combine four columns together to form a full address string. The next step is to populate the dictionary

def feature_from_row(CommunityPartner, latitude, longitude, fulladdress, Primary, Website, Phone):
    feature = {'type': 'Feature', 'properties': {'PartnerName': '', 'Address': '', 'marker-color': '',
                                                 'Website': '', 'PrimaryMission': '', 'Phone': ''},
               'geometry': {'type': 'Point', 'coordinates': []}
               }
    feature['geometry']['coordinates'] = [longitude, latitude]
    feature['properties']['PartnerName'] = CommunityPartner
    feature['properties']['Address'] = fulladdress
    feature['properties']['Website'] = Website
    feature['properties']['PrimaryMission'] = Primary
    feature['properties']['Phone'] = Phone
    if Primary == "Economic Sufficiency":
        feature['properties']['marker-color'] = "FF5733"
    elif Primary == "Social Justice":
        feature['properties']['marker-color'] = "FFF033"
    elif Primary == "Health and Wellness":
        feature['properties']['marker-color'] = "74FF33"
    elif Primary == "Environmental Stewardship":
        feature['properties']['marker-color'] = "338DFF"
    elif Primary == "Educational Support":
        feature['properties']['marker-color'] = "CE33FF"
    else:
        feature['properties']['marker-color'] = "FF3374"
    collection['features'].append(feature)
    return feature

Create a function that will undertake the data processing. Between the brackets are the input variables.

feature = {'type': 'Feature', 'properties': {'PartnerName': '', 'Address': '', 'marker-color': '',
                                             'Website': '', 'PrimaryMission': '', 'Phone': ''},
           'geometry': {'type': 'Point', 'coordinates': []}
           }

Create a “feature”variable as above. Try to mirror it in “type” and “geometry” agains the standard GEOJSON (see above) as much as possible. Leave the “coordinate” value as empty to fill in later. Under “properties”, list the keys you want.

feature['geometry']['coordinates'] = [longitude, latitude]
feature['properties']['PartnerName'] = CommunityPartner
feature['properties']['Address'] = fulladdress
feature['properties']['Website'] = Website
feature['properties']['PrimaryMission'] = Primary
feature['properties']['Phone'] = Phone

Time to populate the keys. Remember to key the names of the keys and input variables consistent with what was already posted so far.

You must wonder: what about “marker-color”. You can use the conditional argument to assign values to the variable as follows:

if Primary == "Economic Sufficiency":
    feature['properties']['marker-color'] = "FF5733"
elif Primary == "Social Justice":
    feature['properties']['marker-color'] = "FFF033"
elif Primary == "Health and Wellness":
    feature['properties']['marker-color'] = "74FF33"
elif Primary == "Environmental Stewardship":
    feature['properties']['marker-color'] = "338DFF"
elif Primary == "Educational Support":
    feature['properties']['marker-color'] = "CE33FF"
else:
    feature['properties']['marker-color'] = "FF3374"

If you wonder about the HTML color code, just Google “HTML Color Code” and you’ll see it.

collection['features'].append(feature)
return feature

The first line of the block above dictates that we add every single row of the Excel file to the “features” key of the collection variable. “Return” is a mandatory feature of every function.

geojson_series = df.apply(
    lambda x: feature_from_row(x['CommunityPartner'], x['Lat'], x['Longitude'], x['description'], x['Primary'],
                               x['Website'], x['Phone']),
    axis=1)

jsonstring = pd.io.json.dumps(collection)

The first line is to add every single row of the Excel file to the function so that we can create the string needed for the GEOJSON. The second line is to turn it into json file.

output_filename = 'CommunityPartner.geojson' 
with open(output_filename, 'w') as output_file:
    output_file.write(format(jsonstring))

Name the file however you want and use the second line to write it into GEOJSON. The file product will look like this:

import pandas as pd

df = pd.read_excel('CommunityPartner.xls') #Get the Excel file from static/Excel

collection = {'type': 'FeatureCollection', 'features': []}

df['description'] = df[['Address', 'City', 'State', 'Zip']].apply(lambda x: ' '.join(x.astype(str)), axis=1)


def feature_from_row(CommunityPartner, latitude, longitude, fulladdress, Primary, Website, Phone):
    feature = {'type': 'Feature', 'properties': {'PartnerName': '', 'Address': '', 'marker-color': '',
                                                 'Website': '', 'PrimaryMission': '', 'Phone': ''},
               'geometry': {'type': 'Point', 'coordinates': []}
               }
    feature['geometry']['coordinates'] = [longitude, latitude]
    feature['properties']['PartnerName'] = CommunityPartner
    feature['properties']['Address'] = fulladdress
    feature['properties']['Website'] = Website
    feature['properties']['PrimaryMission'] = Primary
    feature['properties']['Phone'] = Phone
    if Primary == "Economic Sufficiency":
        feature['properties']['marker-color'] = "FF5733"
    elif Primary == "Social Justice":
        feature['properties']['marker-color'] = "FFF033"
    elif Primary == "Health and Wellness":
        feature['properties']['marker-color'] = "74FF33"
    elif Primary == "Environmental Stewardship":
        feature['properties']['marker-color'] = "338DFF"
    elif Primary == "Educational Support":
        feature['properties']['marker-color'] = "CE33FF"
    else:
        feature['properties']['marker-color'] = "FF3374"
    collection['features'].append(feature)
    return feature


geojson_series = df.apply(
    lambda x: feature_from_row(x['CommunityPartner'], x['Lat'], x['Longitude'], x['description'], x['Primary'],
                               x['Website'], x['Phone']),
    axis=1)

jsonstring = pd.io.json.dumps(collection)

output_filename = 'CommunityPartner.geojson' #The file will be saved under static/GEOJSON
with open(output_filename, 'w') as output_file:
    output_file.write(format(jsonstring))

ExceltoGEOJSON_3

This is how the GEOJSON looks:

 
   “type”:“FeatureCollection”,
   “features”: 
       
         “type”:“Feature”,
         “properties”: 
            “PartnerName”:“75 North”,
            “Address”:“4383 Nicholas St Suite 24 Omaha NE 68131.0”,
            “marker-color”:“FF5733”,
            “Website”:null,
            “PrimaryMission”:“Economic Sufficiency”,
            “Phone”:“402-502-2770”
         },
         “geometry”: 
            “type”:“Point”,
            “coordinates”: 
               -95.957309,
               41.276479
            ]
         }
      },
       
         “type”:“Feature”,
         “properties”: 
            “PartnerName”:“A Time to Heal”,
            “Address”:“6001 Dodge St CEC 216 Suite 219C  Omaha NE 68182.0”,
            “marker-color”:“74FF33”,
            “Website”:null,
            “PrimaryMission”:“Health and Wellness”,
            “Phone”:“402-401-6083” …

One important note. If you are a fan of Jupyter Notebook, beware that there may be a problem when it comes to the last step of the process. Here is how the collection variable looks before being dumped into the GEOJSON file.

ExceltoGEOJSON_4

But I ran into errors in the last step. I spent quite some time on fixing it but I couldn’t.

ExceltoGEOJSON_5

Creating the Python code in Pycharm is much easier and produces the same result. It’s even more convenient if you are in the middle of an application development project.

Hope this post helps. Much thanks to appendto and geoffboeing for inspiration.

 

Create a hover effect on Mapbox

I am sharing my experience in trying to create a hover effect on Mapbox. The first thing to do is to read their example and understand what is going on. Let’s unpack a little bit:

<!DOCTYPE html>
<html>
<head>
    <meta charset='utf-8' />
    <title>Create a hover effect</title>
    <meta name='viewport' content='initial-scale=1,maximum-scale=1,user-scalable=no' />
    <script src='https://api.tiles.mapbox.com/mapbox-gl-js/v0.49.0/mapbox-gl.js'></script>
    <link href='https://api.tiles.mapbox.com/mapbox-gl-js/v0.49.0/mapbox-gl.css' rel='stylesheet' />
    <style>
        body { margin:0; padding:0; }
        #map { position:absolute; top:0; bottom:0; width:100%; }
    </style>
</head>

It’s the <head> of the HTML that has scripts from Mapbox. Just follow them and you’ll be fine. Change the text in <title> to have your own page title.

<body>


—- Your real code goes here —–

</body>

Your real work will go between and . The <div> is a container that refers to the map you are working on. The next part is Mapbox token

mapboxgl.accessToken = '<your access token here>';

To get a token, just create a free account on Mapbox. A free account is allowed up to 50,000 requests a month if I am not mistaken. It should be enough for a student or an enthusiast wishing to try it out. Once you have a token, just put it in between ” in the above line.

Let’s have a base map

var map = new mapboxgl.Map({
    container: 'map',
    style: 'mapbox://styles/mapbox/streets-v9',
    center: [-100.486052, 37.830348],
    zoom: 2
});

The “center” feature’s coordinates refer to where you want to focus on. Get your chosen destination’s coordinates and just put them there. Alternate the two figures in coordinates if you don’t get it right in the first try. “Zoom” is how close you look at the chosen destination. The greater the number, the closer the zoom.

var hoveredStateId =  null;

map.on('load', function () {
    map.addSource("states", {
        "type": "geojson",
        "data": "https://www.mapbox.com/mapbox-gl-js/assets/us_states.geojson"
    });

HoveredStateID is a placeholder variable that will be used later for hover effect. The following code block is to load the base map. Just follow the templates. Three things to note here:

  • “state” refers to the object’s name that contains the GEOJSON data. You can name whatever you want
  • “GEOJSON” refers to to the style of the file. Mapping requires GEOJSON files, whether you load it from an external source, like we do in this case, or from a hardcoded file
  • The link that goes with “data” is where the author stores the data.

One note here: if you use Github or any cloud platform to store and source your file, be careful. For instance, let’s look at a file I have on github.

Github_Link

Just copying the usual link when you access your file on Github like that won’t work. To get the link that works, click on “Raw” and here is how it shows on the screen

Github_Content_Link

Copy the link in the browser. It should work.

Back to the HTML. Add the two “map.addLayer” code sections to what you already have. It should look like the below

map.on('load', function () {
    map.addSource("states", {
        "type": "geojson",
        "data": "https://www.mapbox.com/mapbox-gl-js/assets/us_states.geojson"
    });

    map.addLayer({
        "id": "state-fills",
        "type": "fill",
        "source": "states",
        "layout": {},
        "paint": {
            "fill-color": "#627BC1",
            "fill-opacity": ["case",
                ["boolean", ["feature-state", "hover"], false],
                1,
                0.5
            ]
        }
    });

    map.addLayer({
        "id": "state-borders",
        "type": "line",
        "source": "states",
        "layout": {},
        "paint": {
            "line-color": "#627BC1",
            "line-width": 2
        }
    });

The first addLayer is for the polygon itself while the second one is for the lines between the states. “id” refers to the name of the layer for future reference. Remember to tie the “source” value back to the name of map.addSource. In this case, it’s “states”. The rest is a Mapbox standard template for hover effect. You can change the color whenever you feel like.

The next step is to work on “hover effect”. Add the following code to the end of the previous block

    map.on("mousemove", "state-fills", function(e) {
        if (e.features.length > 0) {
            if (hoveredStateId) {
                map.setFeatureState({source: 'states', id: hoveredStateId}, { hover: false});
            }
            hoveredStateId = e.features[0].id;
            map.setFeatureState({source: 'states', id: hoveredStateId}, { hover: true});
        }
    });

    // When the mouse leaves the state-fill layer, update the feature state of the
    // previously hovered feature.
    map.on("mouseleave", "state-fills", function() {
        if (hoveredStateId) {
            map.setFeatureState({source: 'states', id: hoveredStateId}, { hover: false});
        }
        hoveredStateId =  null;
    });

The first thing to notice is here:  map.on(“mousemove”, “state-fills”, function(e) {

“State-fills” is the “id” of the polygon layer mentioned previously. So whatever name is chosen for that addLayer, it should be used here.

source: ‘states’

In this case, ‘states’ refers to the source of the data in the map.addSource section above. Remember to use the same reference name for the source. The rest is just a standard template. If you have time, feel free to explore. I am under pressure to deliver features for my Capstone, so I just prefer not touching or changing any of it.

Here is an important note. If you don’t follow, the hover effect won’t work. I use the same code as Mapbox’s example, just changing the GEOJSON source. The hover effect doesn’t work as you can see below:

The key is the data source. Let’s look at the data that Mapbox uses. Here is the tree view of the first item in the polygon array, just to show its structure

GEOJSON_2

Here is the structure of the data I used that led to the unsuccessful “hover effect”

GEOJSON_3

Notice the difference? As far as I am concerned, the hover template in question needs the data to have a certain structure. Otherwise, the code won’t work. Now, there should be other ways to go around this, but if you don’t have time, I’d suggest modifying the data to mirror Mapbox’s example. Here is the structure of my modified data

GEOJSON_4

Does the code work? You bet!

GEOJSON_5

Hopefully this post will be useful to starters like I am.

 

 

Some basic SQL & Plotly in Python

In this post, I’ll document how to use SQL and plotly to create a basic bar chart in Python, specifically in Jupyter Notebook. I’ll try to be as detailed and visual as possible.

The dataset I am going to use is Nike Manufacturing Map:

The first step is to click download the dataset in Excel. Once the file is downloaded, open the file and remove the first row. The first row just has the name of the file, not the columns’ names we need. Since I haven’t figured out how to remove the first row, the fast way is to do it before starting the code to make our life easier. Click save and the data is ready to go

Open Jupyter Notebook. In my case, I use Anaconda. The first thing to do on a new Jupyter Notebook is to import necessary packages. The “as ABC” is to create an alias for packages with long names so that we won’t have a hard time calling those packages later in the code. Imagine having to type “plotly.graph_objs” 10 times instead of “go”.

import plotly.plotly as py #this package is for visualization
import plotly.graph_objs as go #this package is for visualization
import pandas as pd #this package is for handling the dataframe
from pandasql import sqldf #This is to allow users to use SQL queries to create a new dataframe
pysqldf = lambda q: sqldf(q, globals())

Packages

Then, load the file. To check if the file is loaded correctly, use df.head(). The command will show a snapshot of the data frame. If you want to see the whole data frame, just use df

df =pd.read_excel(‘/Users/camapcon/Documents/Github/Nike_Factory_Locations/export.xls’) #read the file in
df.head()

Read the file in

We need to change the names of the two columns highlighted in the screenshot. The way that they are named will make it tricky to process SQL queries. I tried it. It’s better to have more straightforward column names with no blank space in between. It’s pretty easy:

df = df.rename(columns={‘Nike, Inc. Brand(s)’: ‘Brands’, ‘% Female Workers’:’FemaleWorkers’, ‘Total Workers’:’TotalWorkers’})
df.head()

Column name change

The next step is to use SQL to create a summary table whose data will be used to draw bar charts. What we will analyze includes:

  • The total number of workers by brands
  • The number of factories by brands
  • Average number of workers at each factory by brands
  • Average percentage of female workers at each factory by brands

The code to use SQL to create a subset is below:

Primary = “””SELECT Brands, Sum(TotalWorkers) as TotalWorkers, Count(Brands) as Count, AVG(TotalWorkers) as AvgTotalWorkers, AVG(FemaleWorkers) as FemaleWorkers From df Group By Brands””” #create a data frame using a SQL
Primary_df = pysqldf(Primary)
Primary_df

Here is the result:

SQL 1

Notice that the figures aren’t pretty, right? Let’s round them up and make them look prettier

Second = “””SELECT Brands, TotalWorkers, Count, Round(AvgTotalWorkers, 0) as AvgTotalWorkers, Round(FemaleWorkers,0) as FemaleWorkers From Primary_df”””
Primary_df = pysqldf(Second)
Primary_df

Here is how it looks:

SQL 2

It’s time to create the bar charts:

trace1 = go.Bar( #create x and y axes
x=Primary_df[‘Brands’], #to use column Brands as the X axis
y=Primary_df[‘TotalWorkers’], #to use column TotalWorkers as the Y axis
name=’Total Global Workers by Nike Brands’
)
data1 = [trace1]
layout1 = go.Layout( #starting to plot a grouped bar chart
title=’Total Global Workers by Nike Brands’,
barmode=’group’
)

fig = go.Figure(data=data1, layout=layout1)
py.iplot(fig, filename=’grouped-bar’)

Here is how it looks

newplot

You can change the Y axis to another column as you wish. Here is an example:

trace2 = go.Bar( #Project Count by Mission Area
x=Primary_df[‘Brands’],
y=Primary_df[‘FemaleWorkers’],
name=’Average Percentage of Female Workers by Nike Brands’
)
data2 = [trace2]
layout2 = go.Layout( #starting to plot a grouped bar chart
title=’Average Percentage of Female Workers by Nike Brands’,
barmode=’group’
)

fig = go.Figure(data=data2, layout=layout2)
py.iplot(fig, filename=’grouped-bar’)

Try to label different charts differently to avoid confusion and errors. It would make debugging easier as well.

newplot-2

That’s all I have for this post. Just some basic SQL queries and Python code to handle and visualize data with Plotly and Pandas. I still need to practice and explore more. And I urge you to do the same, if you are as inexperienced and interested in data analysis like I am.