What does this demo do?
This demo illustrates how the opinions expressed by Twitter users can change over time in relation to specific topics. Only four topics are included: “Google”, “iPhone”, “Obama”, and “Iran”. Use the dropdown menu to view trends for different topics. For each topic, a time graph shows how many tweets were recorded per day and whether these tweets had a positive (green) or negative (red) sentiment. Within each set of positive or negative tweets, we used the Pingar API to identify the most common keywords. The keywords are displayed on mouse-over. Users can also click on each bar to view the first 200 tweets. In each tweet, we highlight keywords, URLs, user names, and hashtags.
Where has the Twitter data come from?
The Twitter data was collected by Stanford University during the period of June the 1st, 2009 to December the 31st, 2009. It was initially released as and was free to download; however, the download link was later disabled due to a conflict with Twitter's privacy policy. This demo only uses a fraction of the dataset: it includes tweets which cover the relevant topics (“Google”, “iPhone”, “Obama” and “Iran”) and which occur during a 20 day period: June 11th - June 30th.
How is the sentiment of a tweet determined?
We used the sentiment analysis algorithm implemented in the , which uses a simple Naïve Bayes approach. We retrained it using the . The drawback of this corpus is that it only focuses on the technical domain. The simplicity of the approach and the limitation of the corpus mean that there are occasional inaccuracies in the sentiment of the tweets.
How are the trending keywords computed?
First, we cleaned the tweets by removing all the duplicates, as thousands of re-tweets and spam tweets can negatively affect the results. From each tweet we removed URLs, hashtags, user names, and stopwords, such as 'RT', 'via', 'lol', 'lmao', and so on. (We did, however, keep the original version of each tweet for displaying later.) Then, we categorized tweets into dates and sentiments, and we applied the method to determine the keywords for the sets of positive and negative tweets. The API returned two lists of keywords along with the keyword scores. When the same keyword appeared in both positive and negative lists, we removed the keyword with the lower score from one of the lists.
How are the graphs generated?
The graphs were generated using the for jQuery.