Setting Up Jupyter Notebooks for Data Visualization with Anaconda

Jupyter Notebooks are a great way to explore data and create visualizations and are an essential tool for data science. Anaconda is the easiest way to install it.

Example Pandas data visualization
An example of a Pandas data visualization plot in a Jupyter Notebook

You can even create complete documents, including the data visualizations, and export them as HTML.


We are going to see how you set up the Jupyter Notebooks environment with Anaconda, see how to create a notebook and then plot our first graph.

You can see a simple example plot from Pandas in a Jupyter notebook, above.


Although the language used to create the plots is Python, you don't really have to know much about  programming, as only a very limited set of the language is needed. You will need to learn the particular syntax of the plotting commands in Pandas and/or Seaborn but not much else. In the example above most of the code is boilerplate stuff, it's the final line that does the work.
 
You can use Pandas or Seaborn by writing a Python program in a programmer's editor and running that program. But I think that the easiest way to produce nice charts, in an interactive environment, is with a Jupyter Notebook.

So, first of all we are going to set up  Jupyter Notebook environment.

This guide is not aimed at seasoned programmers, so I'm going to assume that you do not have Python installed on your computer and will, instead, urge you to install Anaconda on your machine.

Anaconda

Anaconda is single package that contains a whole set of data science tools, based on the languages  Python and R. 

Downloading and installation is straightforward and once installed you will have a vast range of data science and AI tools available to you from a single graphical interface. These tools include Python, Pandas and Jupyter, as well as all the libraries that you need to support them.

Anaconda navigator
Anaconda Navigator

Anaconda is quite big - several gigabytes - but since most computers come with hundreds of gigabytes of storage that probably is not a big deal for most people. For those who are short of space (like when I installed it on a Windows tablet) there is a smaller alternative, Miniconda, but that requires a little more work - I'll cover that separately but, for now we'll consider the full version.

Anaconda is available for Windows, MacOS and Linux, and comes in 32 or 64 bit versions. It also comes in versions that include Python 2.7 Python 3.x - I recommend that you choose the latest Python 3 version that is right for your machine.

First, go to the Anaconda downloads page and, about halfway down you will see links to your operating system, Windows, MacOS or Linux. Click on the right one for you, to download the installer.

Anaconda download


When you have the installer downloaded you can find detailed instructions on how to install it on the appropriate installation page. I've listed them here, for convenience.


However, installation is straightforward. For Windows and Mac users, just download the appropriate version from the web site and install as you would any other program; admin rights are not needed.

For Linux users you do not use apt-get to install, as you might expect. Download the installer, open a terminal window, navigate to where your download is and run the command:
bash ./whatever-your-file-is-called.sh

During the install, it's probably best simply to accept default options when you are given a choice. Once it has got going, the install can be left to its own devices - it may take a while depending on the speed of your internet connection and your PC.

Towards the end of the installation, you will be asked if you want to install Microsoft's Visual Studio Code. This is quite a good programmer's editor but if you decide against installing it at this stage, you can always install it at a later date.

Once installed on a Windows machine, or a Mac, you should find various new items in your start menu. One of them will be the Anaconda Navigator (as shown above). You can also find entries for Jupyter Notebooks and the Anaconda prompt.

On Linux, you may not have these entries added to your menu. If this is the case, then simply open a terminal window and type the command:

anaconda-navigator

This way you will get the Anaconda GUI from which you can start a Jupyter Notebook.

And that's about it. With Anaconda installed you are ready to start using Jupyter to produce great data visualisations.

The next article will be about actually using a notebook and producing a first visualisation using Pandas. When it is ready I'll put a link, here.

Anaconda Navigator

I'm going to start with the Anaconda Navigator GUI as launching a notebook from here is the same whether you are using Windows, MacOS or Linux.

Here's the Navigator screen:


As you can see, the panel on the top left of the main window is for Jupyter Notebooks and to get going simply click on the Launch button. (Your layout may not be the same as mine but the Jupyter Notebook panel will be there somewhere, if it is installed.)

But just before you do, you should know that Jupyter runs in your browser. Launching Jupyter will create a new tab in your default browser with a page that looks something like this:


Of course, since this the home folder on my Windows tablet, your screen won't look exactly the same. But it will probably contain a number of folders that you don't want littered up with notebooks. So, the first thing that you will probably want to do is create a new directory for your work.

So, click on the New drop down and select Folder.


You will now have a new folder called Untitled Folder. Scroll down to find it and click the check box next to it so that it is ticked and then select Rename from the top of the screen and give it a new name.

Now here's a trick. You are probably going to want to use this directory every time you open Jupyter. And if like me you have a whole load of folders to scroll down through, you might want to choose a name that would position it near the top. I called my folder "_Notebooks". The underscore brings it up to the second item on the list so easily seen and clicked on.

Now double click on your new folder and you will see something like this:

Now we are ready to create our first notebook. Click New again but this time select Python3. This will open a new tab in your browser which is your new notebook - it will look like this:


The import bit to notice is the field with the coloured bar on the left. This is a cell. In a cell you can write text or code. The default is code and we are going to write the following code in our cell (I suggest that you cut and paste it from the text, below):

import numpy as np, pandas as pd, matplotlib.pyplot as plt

This line of code imports the necessary libraries that will allow us to do basic visualisation of data. We are importing three libraries, numpy, which provides numerical support for large multi-dimensional arrays, pandas, which provides more data structures and data analysis tools, and matplotlib,which provides a 2D plotting library that is used by pandas to produce charts in a variety of formats.

Now, click on the + icon to create another cell and write the following into it (again, I suggest that you cut and paste this text):

data = pd.Series([0,1,2,3,4,5,6,7,8,9])
data.plot()


You should end up with something like this:


Now, if you are a programmer, you will know what a variable is, and will recognise data as being one. If you are not a programmer, then you only need to understand that a variable is a basically something that can hold a value. So, for example, we could have a variable called and give it the value 5; we can then use the name in various operations and it would represent the value 5. You can think of it as a short cut to using the value.

In the case of the variable I've called data, its value is rather more complex than a single number; it's actually a series of numbers (and we use the pandas library to create a data structure that contains that series). Using pandas we can create all sort of data structures which are, essentially various forms of data tables, in this case, our series of numbers is like a single row of numbers from a table.

So, the first line of our second cell gives, or assigns, the value 0,1,2,3,4,5,6,7,8,9 to the variable data.

Now we can invoke the magic of the pandas library.

The second line of this cell produces a plot, a graph, based on the series of numbers in data.

Let's try it.

First select the first cell and click on the icon that looks like a 'play' button. This will execute the code in that cell. Nothing much will happen, although you should notice that the asterisk has changed to a number. (an asterisk denotes a cell which has not yet been executed). However, executing this cell has imported the libraries that we need.

Now select the second cell and click the play icon. This time something much more exiting happens.

You will see this image displayed in the cell.
It's is, of course, a simple line plot of the data which, as we would expect, results in a straight line graph.

Now You Are Set Up


We've seen how to install Anaconda and Jupyter Notebooks, and how to create out first data visualisation. There is, of course, much more to discover: how to import a data set and how to create different types of visualisation, not just line graphs, but scatter diagrams, bar charts and pie charts, for example.

Comments

Post a comment

Popular posts from this blog

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Simple Data Visualization Tools in Python and Pandas

Give Your Raspberry Pi Desktop a Makeover