Pingar Home

Twitter Demo

This demo is a prototype showcasing how the Pingar API can be used to monitor changes in public opinion over time. We took four topics discussed on Twitter by thousands of people and visualized trends for both positive and negative tweets. We display these trends in an interactive graph: mouse-over and click on the bars for further information. Learn more!
List of Topics

Number of Tweets Over Time

What does this demo do?
This demo illustrates how the opinions expressed by Twitter users can change over time in relation to specific topics. Only four topics are included: “Google”, “iPhone”, “Obama”, and “Iran”. Use the dropdown menu to view trends for different topics. For each topic, a time graph shows how many tweets were recorded per day and whether these tweets had a positive (green) or negative (red) sentiment. Within each set of positive or negative tweets, we used the Pingar API to identify the most common keywords. The keywords are displayed on mouse-over. Users can also click on each bar to view the first 200 tweets. In each tweet, we highlight keywords, URLs, user names, and hashtags.
Where has the Twitter data come from?
The Twitter data was collected by Stanford University during the period of June the 1st, 2009 to December the 31st, 2009. It was initially released as the SNAP: Network dataset and was free to download; however, the download link was later disabled due to a conflict with Twitter's privacy policy. This demo only uses a fraction of the dataset: it includes tweets which cover the relevant topics (“Google”, “iPhone”, “Obama” and “Iran”) and which occur during a 20 day period: June 11th - June 30th.
How is the sentiment of a tweet determined?
We used the sentiment analysis algorithm implemented in the Python NLTK, which uses a simple Naïve Bayes approach. We retrained it using the Sanders Analytics Twitter Sentiment Corpus. The drawback of this corpus is that it only focuses on the technical domain. The simplicity of the approach and the limitation of the corpus mean that there are occasional inaccuracies in the sentiment of the tweets.
How are the trending keywords computed?
First, we cleaned the tweets by removing all the duplicates, as thousands of re-tweets and spam tweets can negatively affect the results. From each tweet we removed URLs, hashtags, user names, and stopwords, such as 'RT', 'via', 'lol', 'lmao', and so on. (We did, however, keep the original version of each tweet for displaying later.) Then, we categorized tweets into dates and sentiments, and we applied the Pingar API Entity Extraction method to determine the keywords for the sets of positive and negative tweets. The API returned two lists of keywords along with the keyword scores. When the same keyword appeared in both positive and negative lists, we removed the keyword with the lower score from one of the lists.
How are the graphs generated?
The graphs were generated using the Flot plotting library for jQuery.