Code Snippet: Replacing Spaces in Pandas Column Names

Annoyingly, many datasets give columns names that include spaces. It makes sense for readability but it doesn't work in Python code when you create a Pandas dataframe from the dataset.

Naturally, names that consist of more than one word have spaces in between.

Unfortunately, programming languages like Python don’t like them — you can’t have a space in an identifier.

So when a data table has a column called “Monthly Rainfall” or “Stock Price” or “Average Income” or some such thing, you can’t use it directly as an identifier in a Pandas dataframe.

Luckily, however it's pretty trivial to fix. Simply replace the space in all the column names with an underscore (or whatever character you prefer). Like this:

df.columns = df.columns.str.replace(' ', '_')

Here we are replacing the space character in the dataframe columns with an underscore using str.replace, the result is assigned to the original columns in the dataframe.

An Example

Here is an example of the problem and how to fix it.

I was recently looking at the UK Parliament petitions web site. You can download a csv with all the petitions that they have received, see what they are about and what level of support they have. Looking at the code below you can see I've downloaded the csv file and created a Pandas dataframe. In that dataframe, you can see a column called "Signatures Count".

I wanted to list all the petitions with more than 10,000 signatures (those with less are not considered by Parliament).

But I can't write a filter with that column name because it would be syntactically incorrect to have a space in the column name. Here's the code - if you run it you get a syntax error.


So I changed the columns names as above and you can see, below, that the column name has indeed changed.


 So now I can filter the dataframe as I wanted. Below is the code to filter for Signature_Count
 being greater than 10,000.


Done!

I hope that was useful and thanks for reading.

Comments

Popular posts from this blog

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Simple Data Visualization Tools in Python and Pandas

Setting Up Jupyter Notebooks for Data Visualization with Anaconda