Showing posts from September, 2019

Data Visualization Tools: Simple Statistical Views in Pandas

Statistical Views with Histograms and Box Plots in Python Pandas - statistical tools that will help you understand your data

Please go to the updated version of this article here

We are going to use Jupyter Notebooks and Python with Pandas to give us a statistical view of our data.
You really don’t need to know much about programming but you will need to be able to understand just a little bit of Python.

Before you start you need to reasonably comfortable using Jupyter Notebooks. If you aren’t or don’t have Jupyter installed on your computer, don’t worry, I’ve written a tutorial that will show you how to install Jupyter and get you going. You can find it here: Setting Up Jupyter Notebooks for Data Visualization .
Also, you might want to take a look at a previous article about data visualization with Pandas that explores the use of line graphs, bar graphs, scatter diagrams and pie charts. It’s here: Simple Data Visualization with Pandas.
In this article we are going to explore the more…

Code Snippet: Replacing Spaces in Pandas Column Names

Annoyingly, many datasets give columns names that include spaces. It makes sense for readability but it doesn't work in Python code when you create a Pandas dataframe from the dataset. Naturally, names that consist of more than one word have spaces in between.

Unfortunately, programming languages like Python don’t like them — you can’t have a space in an identifier.

So when a data table has a column called “Monthly Rainfall” or “Stock Price” or “Average Income” or some such thing, you can’t use it directly as an identifier in a Pandas dataframe.

Luckily, however it's pretty trivial to fix. Simply replace the space in all the column names with an underscore (or whatever character you prefer). Like this:
df.columns = df.columns.str.replace(' ', '_')

Here we are replacing the space character in the dataframe columns with an underscore using str.replace, the result is assigned to the original columns in the dataframe.
An Example Here is an example of the problem a…

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Professional Python programmers, looking for an Editor or IDE, are spoiled for choice. But if you are a Beginner, or a Teacher, what is the best bet?
Python is definitely the language to learn these days. If you are interested in Data Science, Data Analysis, Artificial Intelligence, then Python is the programming language that you are likely to use.

But Python is a great general purpose language, too. You can build Desktop Applications and Web Apps, for example.

But to learn the language you need to be able to practice and for that you will need an editor, or IDE (Integrated Development Environment). So, which IDE is good for beginners?

A good start might be to look at the Raspberry Pi. It is designed for beginners and so it's worth noting that they provide three options for programming in Python: Idle, Geany, and, a relatively new kid on the block, Thonny.

And these are not a bad place to start for any beginner. They are lightweight applications that will run happily on almost any…

Data Visualization with Julia and JuliaBox

With Julia and JuliaBox you can make impressive data visualizations with almost no programming knowledge and no need to install anything.UPDATE: Sad to say that Juliabox is no longer available. A new version of this tutorial using VSCode is here:
Data Visualization with Julia and VSCode
Julia is a relatively new language for data analysis. It has a high-level syntax and designed to be easy to use and understand. Some have called it the new Python.

Unlike Python, though, it is a compiled language, which means that while it is as easy to write as Python, it runs much faster because it is converted to low-level code that is more easily understood by a computer.

This is a great if you have to deal with large data sets that require a lot of processing.

Julia is also much less fussy about how a program is laid out than Python. (Python is one of the few languages that forces the programmer to layout code in a particular way, with certain parts properly indented by a fixed number of spaces or…