Sadness in 140 Characters

Sadness in 140 Characters


Share


michael jackson fail whale Sadness in 140 Characters

On June 25, 2009, news reports announced the death of Michael Jackson, leading to a flood of reactions on Twitter. From 9pm—10pm EDT alone, there were over 279,000 tweets about Michael Jackson, or roughly 78 tweets per second (See graph above). What can be said about this massive body of tweets? What sorts of emotions did people express about Michael Jackson’s death?

Michael Jackson’s death provided occasion for a large wave of digital mourning—that is, the expression of grief online, usually coordinated via a common method or localized to a particular webpage. The latter type of mourning has become popular practice on social networking sites such as MySpace and Facebook, where the profile of the individual who has died is transformed into a digital memorial onto which friends and family leave last goodbyes and testaments.

After Michael Jackson’s death, common digital mourning practices emerged on a variety of platforms. Testimonials and goodbyes poured into Michael Jackson’s Myspace page, Facebook saw a similar influx of grievers on Jackson’s main fan page and in newly created groups. The outpouring of tweets about Michael Jackson contains many similar expressions of grief, but as of yet there has been no research about digital mourning on Twitter.

The body of tweets about Michael Jackson’s death also offers an opportunity to explore strategies for sentiment analysis—the process of determining the attitude of a speaker or speakers towards a particular topic in a large corpus of text. Because of its 140 character limit on messages and the social mores of the platform, Twitter offers challenges to the natural language processing and statistics-based techniques typically used to analyze sentiment.

This report represents a step towards understanding digital mourning and analyzing sentiment on Twitter. After describing our data, this report presents the results of an analysis of sentiment words in that data and findings from hand-coding tweets about Michael Jackson. This closer look at tweets about Jackson’s death provides insights into digital mourning practices on Twitter, assesses the validity of our first attempt at sentiment analysis by zeroing in on a word important to that analysis, and gauges the feasibility of doing larger scale sentiment studies in the future.

Key findings

  • At its peak, the conversation about Michael Jackson’s death on Twitter proceeded at a rate of 78 tweets per second.
  • Users tweeting about Jackson’s death tend to use far more words associated with negative emotions than are found in ‘everyday’ tweets.
  • Roughly 3/4 of tweets about Jackson’s death that use the word “sad” actually express sadness, suggesting that sentiment analysis based on word usage is fairly accurate.
  • That said, there is extensive disagreement between human coders about the emotional content of tweets, even for emotions that we might expect would be clear (like sadness).
  • Tweets expressing personal, emotional sadness about the Jackson’s death showed strong agreement among coders while commentary on the auxiliary social effects of Jackson’s death showed strong disagreement.
  • We argue that this pattern in the “understandability” of certain types of communication across Twitter is due to the way the platform structures the expression of its users.

For this project, we made use of a dataset of 2,331,066 tweets about celebrity deaths (rumored or actual) collected for reasons that go beyond the scope of this report. These tweets were posted to Twitter between June 24 at 12:37am EDT (the day before Jackson’s death) and July 6 at 6:48pm EDT and were collected from Twitter’s search API using the following search terms:

  • MJ
  • Michael Jackson
  • Jackson
  • Farrah
  • Fawcett
  • Jill Munroe
  • Micheal (a very common misspelling)
  • Goldblum
  • Billy Mays

From this dataset of tweets, we worked with the 1,860,427 tweets that contain “mj” or “michael” or “jackson” for this particular report. Because we do not yet have a reliable mechanism for filtering tweets by language, this set contains a small portion of non-english tweets; these tweets are excluded in the analysis that follows.

We also isolated those 44,383 tweets in this set that contained the word “sad.” In addition to analyzing this set of tweets using the ANEW dataset, described below, we randomly selected  346 tweets for human coding. [Web Ecology Project]

Blog Widget by LinkWithin

Post information:
This entry was posted on Wednesday, August 19th, 2009 at 1:19 pm and is filed under Internet Trends
blog comments powered by Disqus
           Sponsors: TechJump! l Kiten l Mahallo Media l Alen Mak l Politics
Go techWALL Homepage