Much like a first date, on a work blog you should never talk about politics, religion or eat an entire rack of BBQ ribs (unless it’s a delicious meat blog). So this week, I’m not going to bury the lead. When doing a large statistical analysis of any dataset, it’s important to check your work. When examining language, tone and intent to make statistical projections about belief, it’s doubly important to make sure the work is done properly—to dig deeper than just the numbers. This is paramount when trying to appear as a trusted news source. On Tuesday, my Twitter feed was blowing up over a screenshot from MSNBC and then later an article called Social media analysis: At the keyboard, Americans slightly prefer Romney. Apparently, a Twitter parody account with over 32,000 followers is one of biggest positive influencers for Mitt Romney.
NBCPolitics.com examined over 2 million tweets and Facebook posts in an effort to analyze a wide number of people who take to their keyboards to share their political beliefs with the world between bags of Cheetos and reruns of Law and Order. As they explained, “Social media analysis is interested in capturing and reporting that structural divide, while controlled national polls have a different mission: capturing a representative sample that proportionally reflects all opinions.” As festive as that sounds, the intent is to look at what people are saying outside the structured polling environment to get another angle for understanding the presidential campaign. The results were interesting; where Obama may be leading in the national polls, Romney was found to be leading the positive sentiment charge on the Internet. From there, NBCPolitics.com provided a topic and keyword chart to help visualize the massive amount of data into bite sized chunks resembling a couple of Smurf everything bagels, and this is where things started to get really intriguing.
You don’t even need to look closely at either the Obama or Romney charts to see some odd results which begin with an “RT” (for the Twitter-uninitiated, RT means retweet or sharing of a tweet) and end with a twitter username. This means there are a high number of retweets for a small number of people who might be skewing the overall results—especially if one of those accounts is providing comedic value. In this case, one of the largest Romney topics is the TeaPartyCat political parody account from Twitter.
NBCPolitics.com doesn’t really get into an explanation of some of the topics displayed in the chart other than the very cursory “this candidate is smart,” or “I think they are electable,” or “I’m voting for cheese,” so they don’t give us a rationale why these oddball topics were uncovered. I suspect the rush to get the article out allowed for some freedom to ignore the topics they didn’t understand, much to my excitement because it gives me something to blog about. The mere fact that they describe their work by saying, “NBCPolitics.com’s analysis, by contrast, explores the actual content of what is being said, providing a glimpse at what issues are specifically driving people’s opinions,” is most ironic. Their social media analysis is a glimpse into the political discussion, but potentially a false one, since it fails to explain the anomalies in the data. As explorers of the social media frontier, no one is going to confuse them with Magellan. The amazing Nate Silver at FiveThirtyEight does a much more thorough analysis of political statistics with a lot less total data, but more importantly he tries to explain the outliers when looking at polls and results. While Silver may be dealing with more traditional political polls, the point holds. Often times the outliers are just as telling as the information you expected to see.
Here is some speculation as to what is going on:
1. So many people are tweeting the words TeaPartyCat unrelated to the parody account that the results are accurate.
2. There are enough positive Romney statements in reply to TeaPartyCat that the replies outweigh the initial tweets.
3. People who feel positively about Romney are retweeting TeaPartyCat without realizing it’s a parody Account.
4. NBCPolitics.com didn’t spot-check the results.
While there are many people who associate with the Tea Party and find LOL Cats adorable, I don’t suspect the first two options are viable. We are all familiar with The Onion effect, where individuals share a story as true from the comedy newspaper, but I believe that impact is also minimal. I’d hazard a guess that NBC just has a glaring failure to review their work. If you go to the article and look at the positive discussion of Romney’s electability there are some expected keyword results like “moderate,” “smart” and “Romney2012.” All three of these terms make absolute sense when doing an analysis of this type. However, when you produce a chart of this size and one of the major results is “RT TeaPartyCat” and that term has no contextual relevance, one would expect them to dig a little deeper or at minimum explain why the topic appears as a major result. Especially when you look at the result from a political perspective and understand that Romney is unlikely the preferred candidate for the Tea Party, although he might be polling well for cats.
In my mind, these results don’t make sense. However, with a little spot checking on their part, they could have provided some really useful information. As we say around our office, ‘we need to make it easy to get right, and hard to get wrong.” However, when trying to formulate statistical arguments around language usage this mantra can be a challenge. Or as Bill Livingston, a favorite sports writer of mine said, “Empiricism, my friends, is a drag.”
Now if you’ll excuse me, I need to see if my wife is going to finish that rack of ribs.