Simple Data Visualization Tools in Python and Pandas

Almost no programming knowledge is required to use Jupyter Notebooks for Data Visualization with Python and Pandas

Data visualization - Pandas multi-plot


Please go to the updated version of this article here




We are going to use Jupyter Notebooks and Python with Pandas to create some great graphs to visualize your data. And you do not need to be a programmer.

But you will need to be able to understand just a little bit of Python.

Before you start you need to reasonably comfortable using Jupyter Notebooks. If you aren’t or don’t have Jupyter installed on your computer, don’t worry. I’ve written a tutorial that will show you how to install Jupyter and get you going. You can find it here: Setting Up Jupyter Notebooks for Data Visualization .

We are going to explore the data visualization capabilities of Pandas, a Data Analysis library for Python used widely in the field of Data Science. We’ll start by introducing the basics — line graphs, bar charts and pie charts — and then we’ll see how we can save those charts so we can utilise them in our own reports, documents and web pages.

We’ll use Jupyter Notebooks to create the charts from a data set in the form of a CSV file.
We use Jupyter Notebooks because they allow us to experiment with the charts that we produce before exporting them for use in a document. They also allow us to create complete documents including those charts and save them as web pages or PDFs.

In fact, this article was originally written in a Jupyter Notebook.

Getting Started

First, you need to start up a Jupyter notebook (refer to the introductory article above, if you are unsure).

In your first code cell type in the following:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

This imports all of the necessary Python libraries to do data visualisation with Pandas: numpy is a maths package, pandas gives us ways of manipulating data and matplotlib provides the basic plotting functionality that Pandas uses to produce charts and graphs.

Run this code once in order that all the subsequent bits of code will work.

Getting the data

Before we start to visualise the data we need to load it from somewhere. I’ve provided a csv file of London weather data from 2018. It’s a subset of historical data available from the Met Office in the UK and records the maximum and minimum temperatures, the rainfall and the number of hours of sun for each month.

The snippet of code below uses a variable weather to hold the weather data and we load the csv file into that variable from the url, as shown.

The second line of code is simply the variable name, and this displays the data as a table.
In the table you can see that the columns are labelled Year, Month, Tmax (Maximum temperature), Tmin (Minimum temperature), Rain (in inches) and Sun (hours of sunlight).


weather = pd.read_csv(‘https://coded2.herokuapp.com/datavizpandas/london2018.csv')
weather







Pandas dataframe

(When you are working with your own data, you can store the data file in the same directory as your notebook — then the code to load the data would look something like:

mydata = pd.read_csv(‘mydata.csv’)

Plotting the temperature

The weather variable is a Pandas dataframe. This is essentially a table, as we saw above, but Pandas provides us with all sorts of functionality associated with the dataframe. One of these functions is the ability to plot a graph. We simply use the code weather.plot() to create a line graph. We need to specify the x and y coordinates, though, and we do this by referencing the column names from the dataframe thus:

weather.plot(y=’Tmax’, x=’Month’)







Pandas line chart

You can see that, when you run this cell, it gives you a simple line plot of the maximun temperature over the 12 month period.

To plot both maximum and minimum temperatures, we give two column names enclosed within square brackets and separated by a comma, like this:


weather.plot(y=[‘Tmax’,’Tmin’], x=’Month’)







Pandas line chart

A line chart is the default when you use the plot function. If we want to draw some other form then we have to specify which one.

Bar Charts

To draw a bar chart of the same data as above, we add the text .bar to the plot function. Like this:


weather.plot.bar(y=’Tmax’, x=’Month’)







Pandas vertical bar chart

And if we prefer horizontal bars we use the barh version.


weather.plot.barh(y=’Tmax’, x=’Month’)







Pandas horizontal bar chart

Plotting two sets of data (Tmax and Tmin) in a bar chart is similar to what we saw above with the line chart:


weather.plot.bar(y=[‘Tmax’,”Tmin”], x=’Month’)







Pandas bar chart

Multiple charts

If we want to create a set of separate charts for each type of data, we can. We set the x and y values as usual but, in addition, we specifiy a parameter subplots as being True (the default is False) and, if we wish, we can set the layout as you see below.


weather.plot.bar(y=[‘Tmax’,”Tmin”,”Rain”,”Sun”], x=’Month’, subplots=True, layout=(2,2))








Pandas multiple bar chart

We can do this with any type of chart. Here’s the same thing but as a set of line graphs.


weather.plot(y=[‘Tmax’,”Tmin”,”Rain”,”Sun”], x=’Month’, subplots=True, layout=(2,2))







Pandas multiple line chart

Scatter Plot

A scatter plot plots a series of points that correspond to two variables and allows us to determine if there is a relationship between them. The one below plots Sun and Rain. There are 12 points, one for each row in the table, and the points plot the value of Rain on the vertical axis against Sun on the horizontal one.

It’s not particularly clear but you can see a vague linear relationship. A straight line that was the best fit through the twelve points would start somewhere high up on the left and end up low on the right. That tells us that when Rain has a high level, Sun has a low one, and vice versa. Which common sense tells us is probably right — there is, generally speaking, an inverse relationship between the amount of sun and the amount of rain.


weather.plot.scatter(x=’Sun’, y=’Rain’)







Pandas scatter chart

Pie Charts

Pie charts are the last type we are going to deal with here. But since the weather data doesn’t really lend itself to representation in a pie chart, we’ll load some more data that represents peoples choices for meals (yes, you’re right, I made it up!).


meals = pd.read_csv(“https://coded2.herokuapp.com/datavizpandas/meals2.csv", index_col = 0)
meals

One thing to notice here is that you need to specify the index for the pie chart — that’s the labels of the data to be plotted. In this case it’s in column 0, the one that contains the labels for the different meals.







Pandas dataframe

meals.plot.pie(subplots=True)







Pandas pie chart

Saving the Charts

This is all very well but maybe you want to be able to use the charts that you produce. If you want to use them in a presentation or document, then it would be useful to be able to export them as image files that you can include in another file.

The simple way of saving the images is like this:


meals.plot.pie(subplots=True)
plt.savefig(“pie.png”)

The variable plt is created when you plot a graph and it has a function called savefig which is used to save the image. You can see that the name of the file is specified inside the brackets and in this case it will save to a file called “pie.png” in the same directory as the notebook.

That’s about it for now

This was an introduction to data visualization with Pandas and Jupyter Notebooks. We’ve seen how we can produce a range of charts from a data file and save them for use in our documents.

There’s plenty more to find out, and I’ll be looking a more advanced topics in a subsequent article (see below).

Thanks for reading.

Update: the next article is here, Data Visualization: Simple Statistical Views in Pandas

Comments

Popular posts from this blog

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Give Your Raspberry Pi Desktop a Makeover