Data Visualization with Julia and JuliaBox


With Julia and JuliaBox you can make impressive data visualizations with almost no programming knowledge and no need to install anything.

UPDATE: Sad to say that Juliabox is no longer available. A new version of this tutorial using VSCode is here:


Julia is a relatively new language for data analysis. It has a high-level syntax and designed to be easy to use and understand. Some have called it the new Python.

Unlike Python, though, it is a compiled language, which means that while it is as easy to write as Python, it runs much faster because it is converted to low-level code that is more easily understood by a computer.

This is a great if you have to deal with large data sets that require a lot of processing.

Julia is also much less fussy about how a program is laid out than Python. (Python is one of the few languages that forces the programmer to layout code in a particular way, with certain parts properly indented by a fixed number of spaces or tabs. This makes for easy to read code but can be a bit fiddly unless you have an good editor.)

Julia has all the features that you would expect of a modern programming language but, here, we are going to take a look at Julia’s data visualization capabilities. 

They are both impressive and easy to use.

JuliaBox

You don't need to install Julia to follow this article or to create your own Julia data visualizations.  These examples were written in a Jupyter notebook using a free account on JuliaBox**.

JuliaBox is an online environment for creating Jupyter Notebooks. It is simple to use and free (there are restrictions but and you can buy more computing power if you need it, but for our purposes, the free account is perfect). 

With a free** JuliaBox account, you can create Jupyter Notebooks that run Julia code, execute them, download them and export them as HTML. 

You can save your visualizations as standard png files and download them to include in your documents.

Also, Juliabox already has the libraries that we want to use, so again no installation required.

Jupyter Notebooks are easy to use but, if you want to become more familiar with Jupyter or with Julia, there will be some useful guides in the tutorials folder that you will find when you start with JuliaBox.

If you want to work through the examples in this article, you can download the data files and the Jupyter notebook to use with your own JuliaBox account. I'll put the links at the end of the article.

Plots

Julia, as with most other languages, relies on libraries of code for particular specialist purposes. The one that we are initially interested in is called Plots. This provides us with the capability to create visualizations of data.

So the first piece if code that we need to execute is this:

using Plots

When you type this into a code cell in your notebook and press ctrl/enter to execute it, it tells Julia to load the library that we will use to create our visualizations. 

When you execute a cell in a notebook for the first time, it may take a little while to execute. This is because the Julia code is compiled on the fly, the first time you execute. Subsequent code runs are much quicker. (This is another advantage of JuliaBox - their computing power is probably rather better than yours or mine, so the compilation time is faster.)

Your first graph

I usually put different bits of code into new cells in the notebook. This means that I only have to run the code that I need and not the whole notebook. I suggest that you do the same, so in a new cell I typed in the following code:

x = 1:10; y = rand(10); # These are the plotting data
plot(x,y, label="my label")

Running it produced the following graph:




Impressive. Let me explain what's going on.

x = 1:10; y = rand(10); # These are the plotting data

This bit of code creates two bits of data, one is called x and the other y. x is given the value of a range of numbers from 1 to 10, while y is given a range of 10 pseudo-random numbers (each will have a value between 0 and 1). So, we have the basis of a graph here: an x-axis that ranges from 1 to 10 and y values for each of the points on the x axis.

The next bit is easy.

plot(x,y, label="my label")

This code calls a function to plot the graph and all we do is give it the x and y values - and, as an extra, we've give it a label, too.

Real data

That was easy but, course, we really want to visualize some real data.

I have a couple of tables that I have used for other articles. It's a set of data about the weather in London, UK, over the last few decades. I derived it from public tables provided by the UK Met Office.

The data records the maximum temperature, minimum temperature, rainfall and hours of sunshine recorded in each month. I have two files, one is the complete data set and the other is for 2018, only. They are in CSV format, such as you might import into a spreadsheet.

To deal with these files we need another library that allows us to read CSV files.

We can see the library referenced in the next chunk of code, i.e. "using CSV" and the following line actually reads in the data to a variable d.

using CSV
d = CSV.read("london2018.csv")

The result of running the code is that we now have a table of data that looks like this:



Charts from meaningful data

The data that we have downloaded is formed of a table with 6 columns: Year, Month, Tmax (maximum temperature), Tmin (minimum temperature), Rain (rainfall in millimeters) and Sun (the number of hours of sunshine).

This is a subset of the data (for 2018, only) so the Year column has the same value in all the rows.

Bar chart

So, what we have is the data for each month of 2018. If we wanted to plot the maximum temperature in each month in a bar chart we would do this:

bar(d.Month,d.Tmax)

bar is a function that draws a bar chart (what else?) and we provide the columns for the x and y axes. We do this by using the name of the data table, followed by the name of the column. The two names are separated with a dot.

Here we have the column  Month as the x axis and Tmax as the y axis- so we are plotting the maximum recorded temperature for each of the 12 months in the table.

Put this in a new code cell and run it and you will be pleasantly surprised (I hope) to see this chart:




Line charts

If you wanted to produce a line chart, you do much the same thing but use the function plot

plot(d.Month, d.Tmax)





And, if you wanted to plot both maximum and minimum temperatures on the same graph, you could do this:

plot(d.Month, [d.Tmax, d.Tmin], label=["Tmax","Tmin"])

Note that the two values, d.Tmax and d.Tmin, are grouped together in square brackets and separated by a comma. This is the notation for a vector, or singled dimensional array. Additionally, we have added labels for the lines and these are grouped in the same way. We get a graph like this:



Scatter chart

Or how about a scatter chart? A scatter chart is often used to see if a pattern can be detected in the data. Here we plot the maximum temperature against the hours of sunshine. As you might expect there is a pattern: a visible correlation - the more hours of sunshine there are, the higher the temperature.

scatter(d.Tmax, d.Sun)


Pie chart

The data we have doesn't really lend itself to being depicted as a pie chart, so we are going to generate some random data, again - 5 random numbers.

x = 1:5; y = rand(5); # These are the plotting data
pie(x,y)


Histogram

Now we are going to load in a bit more data:

d2 = CSV.read("londonweather.csv")

This is similar to the data table that we have been using but rather bigger as it covers several decades of data rather than just one year. This gives us plenty of rainfall data so that we can see the distribution of the levels of rain that occur in London over a longish period.

histogram(d2.Rain, label="Rainfall")Here's the result.


Saving a chart

It's all very well seeing these charts in the JuliaBox environment but to be useful, we need to be able to download them so as to use them in our documents.

The first step is to save the chart like this:

histogram(d2.Rain, label="Rainfall")
savefig("myhistogram.png")

When this code is run the chart will not be displayed but it will be saved with the file name given. And you can find it in the JulaBox file list. Select it by clicking in the tick bob next to the file name and a list of options will appear above the file list. One of the option will be "download", click on this and it will be downloaded to your computer. 

Conclusion

I hope that was useful - we've looked at the basic charts available in Julia Plots and I've suggested that using JuliaBox is an easy way forward for creating and downloading the charts. 

There is a great deal more to Julia than we have seen in this short article and a lot more that you can do with Plots, too.

I hope you have found that this introduction has whetted your appetite to find out more.

Thanks for reading.


** I'm sad to report that the free account seems to be no longer available. You need to fork out a few dollars per month for an account, now. However you can download Julia for free and use it's console to produce plots like these. Alternatively, you can use an editor like Visual Studio Code or Jupyter Notebooks to edit and run Julia programs,


Downloads

Right click on the three links below to download the files and then upload them to JuliaBox. Then in your file list open plotweatherjulia.ipynb.
The notebook plotweatherjulia.ipynb, and the data files, london2018.csv and londonweather.csv






Comments

Popular posts from this blog

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Simple Data Visualization Tools in Python and Pandas

Setting Up Jupyter Notebooks for Data Visualization with Anaconda