Visualizing Simple Twitter Statistics with Python and Pandas

Twitter statistics are great to see how well you are engaging with your audience but you can use Python to see how are others doing?

If you use Twitter cards or adverts, you can get a very good idea how people are engaging with your tweets from the official Twitter statistics. But what about your friends, your colleagues… your competitors? Just a little bit of Python code (that you can download) might do the trick.

Need An introduction to Python?

Imagine that you are a global news giant and are wondering just how well you are regarded by your audience compared with, say, CNN, or the BBC. One thing you might do is compare the level of engagement of their tweets and compare them to your own.

While you may not be able to see the same engagement statistics as for your own account, there are some simple stats that you can see. It just takes a little programming and a Twitter Developer account.

I’m going to show you how it is really quite straightforward to monitor the number of retweets and likes that other users have and compare them to your own or others by using the Twitter developer’s API. We’ll also produce some simple statistics and graphs using Pandas.

Don't worry if you aren't much of a coder, you can cut and paste the code below, or download a complete program (I'll link to it at the end of this article).

First things first - get a Twitter Developer account

This is simply a matter of heading over to the Twitter Developer web site and applying. You have to give a bunch of details about yourself and what you intend to do and, in return, they will give you a set of codes that will let you access their API.

I won’t go into the details, here, because it’s a straightforward procedure and, anyway, sometimes these things change, so it's better just to follow the instructions from Twitter.

A basic Python application

I tend to use Jupyter Notebooks for this sort of thing because it allows me to quickly prototype applications and it’s easy to use. If you aren’t familiar with Jupyter you might want to take a look at this article:

However, the code that I’m going to show you should work equally well in a Jupyter Notebook or as a standalone program in a Python editor (Thonny is a good choice if you are a beginner).

The easiest way to access the API is to use a Python library. There are a number of them around and for a simple app like this, any should be OK. I’m going to use the Twitter tools that I installed a while ago. If you don’t have it you can simply install it with pip or conda, e.g.:

pip install twitter

Here is the code that you will need to start using the Twitter API:

import twitter

CONSUMER_KEY = 'xxxxxxxxxx’
CONSUMER_SECRET = 'xxxxxxxxxx’
OAUTH_TOKEN = 'xxxxxxxxxx’
OAUTH_TOKEN_SECRET = 'xxxxxxxxxx’


twitter_api = twitter.Twitter(auth=auth)

We start by importing the library and then setting the four constants that you need to access the API. Clearly, I’ve hidden my own credentials here, so you need to get the codes from your Twitter account and assign them to the appropriate variables.

I’m not going into the mechanisms of how to access the API (if you are interested, you can read all about it on the Twitter developer’s web site), just use this as boilerplate code and we’ll get on with the real business of this article.

We just need to know that we have created an object, twitter_api, which is where we will find the methods that we want to use.

Just one more thing before getting down to business, we will be using Pandas and Matplotlib, so we need to import them, too.

import pandas as pd
import matplotlib.pyplot as plt

Searching for tweets

There are three different tiers of the Twitter API that each have different functionality and restrictions. We are going to be using the most basic - and free - one which allows you to do simple searches but restricts the responses to tweets from the last 7 days. It also returns no more than 100 tweets from a single API call.

Here is a simple search:

tweets ="CNN", count=10)

The parameters determine what is being searched for (q) and the number of results - tweets - to be returned (count). In this case we will get 10 recent tweets that contained the word ‘CNN’.

But for the purposes of monitoring our rivals in breaking world news, we don’t want tweets that mention CNN, we want the tweets from CNN. Luckily, the query, q, can contain a number of different modifiers and what we are particularly interested in here is from. Where the previous search returned tweets from anyone, using from restricts those tweets to a particular account. Take a look at this search:

tweets ="from:CNN", count=10)

Again we are expecting 10 results but rather than specifying a word to look for, we are telling Twitter to return 10 tweets from the user CNN. We can specify any valid Twitter name - you can try it with your own twitter name or anyone else's.

So let’s try this out. In the code above, we first retrieve 10 tweets from the CNN account. The result that we get back is a dictionary with two elements, statuses and metadata. The metadata gives us information about the search and the statuses part contains an array of the actual tweets that have been returned.

We are going to ignore the metadata and concentrate on the tweets that have been returned in statuses. And to help us with getting statistics and plots, we’ll assign these to a Pandas dataframe, like this:

# Get the data into a Pandas dataframe
tweetData = pd.DataFrame(tweets['statuses'])

To see the sort of data that is returned, we can print the column names of the dataframe:

# Print the columns of the dataframe

And we’ll get a result like this:

Index(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities',
'metadata', 'source', 'in_reply_to_status_id',
'in_reply_to_status_id_str', 'in_reply_to_user_id',
'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo',
'coordinates', 'place', 'contributors', 'retweeted_status',
'is_quote_status', 'retweet_count', 'favorite_count', 'favorited',
'retweeted', 'lang', 'possibly_sensitive'],

You can see that there are a whole lot of items of data that come with each tweet. We won’t worry too much about most of them but a few interesting ones are,

'created_at': the date and time of the tweet
'id': the id of the user
'text': the actual text of the tweet (possibly truncated)
'retweet_count' : the number of times this tweet has been retweeted
'favorite_count': the number of likes that the tweet has received

We can use these fields to retrieve data about each tweet. For example the following code prints out the number of times a tweet has been liked.


And gives a result like this:

0    5
1    219
2    414
3    934
4    271
5    830
6    167
7    247
8    1128
9    281

That a list of the number of likes for each of the 10 tweets that we got from our search (note that Python numbers things starting at 0, so the range is 0 to 9, not 1 to 10).

We can be a bit more adventurous that a simple list. Here’s how to create a pie chart of the number of retweets of each of the 10 tweets from the same search.


Some real statistics and visualization
Here’s something a bit more serious that will help us judge the popularity of CNN’s tweets.

# Get the data
tweet_count = len(tweetData)
favorite_count = tweetData['favorite_count'].sum()
retweet_count = tweetData['retweet_count'].sum()

# Print it out
print('Number of tweets: ' + str(tweet_count))
print('Total number of likes: ' + str(favorite_count))
print('Total number of retweets: ' + str(retweet_count))

# Draw a nice plot of the likes and retweets, figsize=(10,6),y=['favorite_count','retweet_count'])

This code records three bits of data, the length of the dataframe (i.e. the number of tweets), the sum of the favourite_count column (i.e. the total number of likes for all of the tweets) and the sum of the retweet_count column (i.e. the total number of likes for all of the tweets).

Next we print out that data.

And then we draw a bar chart of the likes and retweets for each individual tweet. Here’s the result:

Number of tweets: 10
Total number of likes: 4496
Total number of retweets: 1412

So that’s a summary of the data for the 10 CNN tweets that we searched for. Now if we do the same for our own global news Twitter account, we can see who is the most popular!

And here is one way of going about doing this. The code below takes a list of Twitter names, calculates some stats and prints out the bar charts for each name. It’s essentially the same code as above but with each search inside a loop and a few cosmetic changes to lable the charts a bit better.

names = ['CNN','BBCWorld']
for name in names:
  tweets = pd.DataFrame("from:"+name,        count=10)['statuses'])
  tweet_count = len(tweets)
  favorite_count = tweets['favorite_count'].sum()
  retweet_count = tweets['retweet_count'].sum()

  print("Data for "+name)
  print('Number of tweets: ' + str(tweet_count))
  print('Number of likes: ' + str(favorite_count))
  print('Number of likes per tweet:         
  print('Number of retweets: ' + str(retweet_count))
  print('Number of retweets per tweet: 
    '+str(retweet_count/tweet_count)), figsize=(10,6),y=

Data for CNN
Number of tweets: 10
Number of likes: 5471
Number of likes per tweet: 547.1
Number of retweets: 1806
Number of retweets per tweet: 180.6

Data for BBCWorld
Number of tweets: 10
Number of likes: 2103
Number of likes per tweet: 210.3
Number of retweets: 2527
Number of retweets per tweet: 252.7

That’s about it. Simple stuff but fun and useful. And it’s not miles away from you own personalised dashboard. You could use this as the basis of a web app and always have your rivals Twitter data to hand. In fact, here is a link to a very simple web-based dashboard that is based on this code. 

It only does one Twitter user at a time but you can select which one you are interested in using the form at the bottom of the page.

Have fun with this and feel free to contact me on Twitter @MrAlanJones.

You can download some code from this article here.

And you can find more articles on my website.


Post a comment

Popular posts from this blog

3 Excellent Python IDEs for beginners - Thonny, Geany or Idle

Simple Data Visualization Tools in Python and Pandas

Give Your Raspberry Pi Desktop a Makeover