Hourly mentions of a word on Twitter
23 May 2015Some time ago (ok a month ago—time ✈️s), I saw this tweet:
Need a simple tool to track mentions of a keyword on Twitter by hour. Don’t need a bunch of bells and whistles. Thoughts?
— Kaegan Donnelly (@kaequan) April 23, 2015
I thought, “should be easy, lmgt.” However, results for the query “hourly mentions of a word on twitter” didn’t offer clear solutions. Days later I came across two relatively simple approaches to tackling the problem. The first is Tweepy. The other is Logstash.
Tweepy is an open source Python library for accessing the Twitter API, including the Twitter Streaming API.
Logstash is an open source tool for collecting, processing, and forwarding events. Logstash can read events from the Twitter Streaming API using its twitter
plugin.
Having tried both, I recommend Logstash over Tweepy for two main reasons:
- it deals with the Twitter API rate limits by default
- it offers Elasticsearch and Kibana integration—simplifying the aggregation and visualization steps, respectively, that naturally follow the data (tweet) collection step
For both Tweepy and Logstash you need access to Twitter’s streaming API. Follow steps 2 and 3 here to create a Twitter app and obtain your Consumer Key, Consumer Key Secret, Access Token, and Access Token Secret.
The ELK solution
Download and install Elasticsearch, Logstash, and Kibana. If you are on a Mac, you can
Make sure you have Elasticsearch and Kibana running. Before running Logstash, you need to prepare a configuration file. Below is a sample configuration file to collect tweets containing the word ireland
(call it ireland.conf
)
To start streaming tweets, run
At this point, tweets are written to stdout
. In order to visualize tweet counts using Kibana, you need to save the tweets to Elasticsearch.
Add the elasticsearch
plugin to the output
section of the configuration
Run Logstash again and have a look at:
Below is a sample of the output format. You can see, for example, that 65235
documents (tweets) have been stored in the irelandtweets
index
To start using Kibana, visit
On the Discover tab, there is a configuration form:
- Check off the box: Index contains time-based events
- Fill the Index name or pattern field with
irelandtweets
- Fill the Time-field name field with
@timestamp
On the Visualize tab, choose visualization type Line chart
.
- Choose option
From a saved search
to use the same query you specified on the Discover tab - On the left hand side, you can specify metric and bucket aggregations:
- For metric aggregation— same as Y-Axis aggregation—choose
Count
- For bucket aggregation—same as X-Axis aggregation
- Fill the Aggregation field with
Date Histogram
- Fill the Field field with
@timestamp
- Fill the Interval field with
Minute
- Fill the Aggregation field with
- Click on the Refresh Interval tab at the top. Choose
5 seconds
and see your line chart come alive 📈
Done. Thank you for starting the conversation Kaegan!
More resources
For details about Logstash plugins see this guide.
Anna Roes has written an excellent overview of Kibana in this tutorial.