I recently worked on a report documenting instructor and workshop activities for The Carpentries. Much of it was built using Pandas, a tool I am relatively familiar with. I also included some geospatial visualizations, using GeoPandas for the first time.
The Python GeoPandas library works much like Pandas, but for geographical data. See installation instructions.
It comes with a few datasets to plot country maps (polygons), city maps (points), and New York City boroughs (polygons).
I used the
countries dataset merged with my own data to plot a choropleth map of my country data. The
countries dataset is a GeoDataFrame that includes columns for population, continent, country name, 3 character country code, GDP, and geometry (polygon).
I started by importing all the things.
import matplotlib.pyplot as plt import numpy as np import pandas as pd import geopandas as gpd
I read the
countries data set into a variable called world and see the columns it includes.
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')) world.columns
Index(['pop_est', 'continent', 'name', 'iso_a3', 'gdp_md_est', 'geometry'], dtype='object')
I also have my own dataset, as a Pandas dataframe, called
id date country 4251 2016-02-03 US 6269 2016-02-03 NZ 7995 2016-02-03 US 1701 2016-02-04 GB 8221 2016-03-16 AU # ...
However, my data had all countries stored as two digit character codes, not three as the GeoPandas data uses. To fix this I used
pycountry. I wrote a function that takes the two digit code, gets the related country, and returns its three digit code. That function is then applied to the dataframe.
import pycountry def get_country_name(alpha_code): return pycountry.countries.get(alpha_2=alpha_code).alpha_3 country_data['country'] = country_data['country'].apply(get_country_name)
id date country 4251 2016-02-03 USA 6269 2016-02-03 NZL 7995 2016-02-03 USA 1701 2016-02-04 GBR 8221 2016-03-16 AUS # ...
I then plotted the GeoPandas map, using the
plot() function, much as you would with a typical Pandas plot.
I then did a Pandas merge on the two data sets as follows.
# Sample code: # pd.merge(frame_1, frame_2, how = 'left', left_on = 'county_ID', right_on = 'countyid') world_merged = pd.merge(world, country_data, how='left', left_on = 'name' , right_on = 'country' )
Since I only have data for a few countries, values for all other countries got set to
NaN in the merge. I transformed all the
NaN values to zero.
Now I can plot the merged data.
I plotted the original world data. I changed my data set to exclude countries where
count was zero.
world2 = world2[world2['count'] != 0]
Use Yellow-Orange-Red color scale. Color scales can be sequential, diverging, or discrete. See https://python-graph-gallery.com/197-available-color-palettes-with-matplotlib/ for examples.
fig, ax = plt.subplots() # I didn't want the countries with count=0 to show in the legend or color scale so I dropped them from the dataframe. world2 = world2[world2['count'] != 0] # Plotted the original `world` data set, with black outlines and gray fill on the axes already defined. Without this, only the countries with values greater than zero would plot. world.plot(ax=ax, color='#DCDCDC', edgecolor='black') # Over that plot my data, on the same axes. world2.plot(ax=ax, column='count', categorical=True, legend=True, cmap="YlOrRd", edgecolor = 'black') # Display the map # plt.show()
The initial plot will be small. To make it bigger and keep it to scale I got the figure size, looked at the height and width proportion, and used it to set a new figure size.
size = fig.get_size_inches()*fig.dpi
It was a ratio of 1.5:1 so I set the new figure size, back above where I set up the original
fig, ax. It has to be set here - it didn't work when I tried to set
figsize anywhere else.
fig, ax = plt.subplots(figsize=(18,12))
The plot includes x and y axis ticks by default, which are useful for most charts and graphs but not for a map like this. Here's a shorthand way of hiding them (before calling
plt.xticks(, ) plt.yticks(, )
Lastly, all the numeric values are floats, but I needed them to be ints since they represent counts. Turns out Pandas coerces new numeric values to floats by default.
Since my data represents counts, and Pandas defaults to all numeric data types as floats, I had to transform them to
world_merged['count'] = world_merged['count'].astype(int)
I also wanted to give my map a title. This isn't assigned like most Pandas plots. Box plots also do it this way -- I don't know why.
title = "MY MAP" ax.set_title(title) ax.get_figure().suptitle("")
So the final code to draw the map looks like this:
fig, ax = plt.subplots(figsize=(18,12)) title = "MY MAP" ax.set_title(title) ax.get_figure().suptitle("") # I didn't want the countries with count=0 to show in the legend or color scale so I dropped them from the dataframe. world_merged = world_merged[world_merged['date'] != 0] # Plotted the original `world` data set, with black outlines and gray fill on the axes already defined world.plot(ax=ax, color='#DCDCDC', edgecolor='black') # Over that plot my data, on the same axes. world_merged.plot(ax=ax, column='date', categorical=True, legend=True, cmap="YlOrRd", edgecolor = 'black') plt.xticks(, ) plt.yticks(, ) # Display the map plt.show()
The geopandas documentation makes you think you have to use the
scheme option to manipulate this more. For simple categorial maps the
scheme option is not necessary.