Archive for August, 2012

A Funny Thing Happened on the Way to the Polls

Thursday, August 30th, 2012

Much like a first date, on a work blog you should never talk about politics, religion or eat an entire rack of BBQ ribs (unless it’s a delicious meat blog). So this week, I’m not going to bury the lead. When doing a large statistical analysis of any dataset, it’s important to check your work. When examining language, tone and intent to make statistical projections about belief, it’s doubly important to make sure the work is done properly—to dig deeper than just the numbers. This is paramount when trying to appear as a trusted news source. On Tuesday, my Twitter feed was blowing up over a screenshot from MSNBC and then later an article called Social media analysis: At the keyboard, Americans slightly prefer Romney. Apparently, a Twitter parody account with over 32,000 followers is one of biggest positive influencers for Mitt Romney. examined over 2 million tweets and Facebook posts in an effort to analyze a wide number of people who take to their keyboards to share their political beliefs with the world between bags of Cheetos and reruns of Law and Order. As they explained, “Social media analysis is interested in capturing and reporting that structural divide, while controlled national polls have a different mission: capturing a representative sample that proportionally reflects all opinions.” As festive as that sounds, the intent is to look at what people are saying outside the structured polling environment to get another angle for understanding the presidential campaign. The results were interesting; where Obama may be leading in the national polls, Romney was found to be leading the positive sentiment charge on the Internet. From there, provided a topic and keyword chart to help visualize the massive amount of data into bite sized chunks resembling a couple of Smurf everything bagels, and this is where things started to get really intriguing.

You don’t even need to look closely at either the Obama or Romney charts to see some odd results which begin with an “RT” (for the Twitter-uninitiated, RT means retweet or sharing of a tweet) and end with a twitter username. This means there are a high number of retweets for a small number of people who might be skewing the overall results—especially if one of those accounts is providing comedic value. In this case, one of the largest Romney topics is the TeaPartyCat political parody account from Twitter. doesn’t really get into an explanation of some of the topics displayed in the chart other than the very cursory “this candidate is smart,” or “I think they are electable,” or “I’m voting for cheese,” so they don’t give us a rationale why these oddball topics were uncovered. I suspect the rush to get the article out allowed for some freedom to ignore the topics they didn’t understand, much to my excitement because it gives me something to blog about. The mere fact that they describe their work by saying, “’s analysis, by contrast, explores the actual content of what is being said, providing a glimpse at what issues are specifically driving people’s opinions,” is most ironic. Their social media analysis is a glimpse into the political discussion, but potentially a false one, since it fails to explain the anomalies in the data. As explorers of the social media frontier, no one is going to confuse them with Magellan. The amazing Nate Silver at FiveThirtyEight does a much more thorough analysis of political statistics with a lot less total data, but more importantly he tries to explain the outliers when looking at polls and results. While Silver may be dealing with more traditional political polls, the point holds. Often times the outliers are just as telling as the information you expected to see.

Here is some speculation as to what is going on:

1. So many people are tweeting the words TeaPartyCat unrelated to the parody account that the results are accurate.
2. There are enough positive Romney statements in reply to TeaPartyCat that the replies outweigh the initial tweets.
3. People who feel positively about Romney are retweeting TeaPartyCat without realizing it’s a parody Account.
4. didn’t spot-check the results.

While there are many people who associate with the Tea Party and find LOL Cats adorable, I don’t suspect the first two options are viable. We are all familiar with The Onion effect, where individuals share a story as true from the comedy newspaper, but I believe that impact is also minimal. I’d hazard a guess that NBC just has a glaring failure to review their work. If you go to the article and look at the positive discussion of Romney’s electability there are some expected keyword results like “moderate,” “smart” and “Romney2012.” All three of these terms make absolute sense when doing an analysis of this type. However, when you produce a chart of this size and one of the major results is “RT TeaPartyCat” and that term has no contextual relevance, one would expect them to dig a little deeper or at minimum explain why the topic appears as a major result. Especially when you look at the result from a political perspective and understand that Romney is unlikely the preferred candidate for the Tea Party, although he might be polling well for cats.

In my mind, these results don’t make sense. However, with a little spot checking on their part, they could have provided some really useful information. As we say around our office, ‘we need to make it easy to get right, and hard to get wrong.” However, when trying to formulate statistical arguments around language usage this mantra can be a challenge. Or as Bill Livingston, a favorite sports writer of mine said, “Empiricism, my friends, is a drag.”

Now if you’ll excuse me, I need to see if my wife is going to finish that rack of ribs.


Your Email Address Isn’t as Private as You Think

Monday, August 13th, 2012

It’s almost football season again. Crisp fall air, bright blue skies, half-grilled/half-frozen brats, cold beer and the slow march of despair from week one to week seventeen that every Cleveland Browns fan experiences—at least this year with the training camp injuries and suspensions, we’ll get a head start on that journey. It also means a plethora of unsolicited emails from companies who scraped my email address off the Cleveland Browns’ website to offer me a myriad of NFL branded products – ranging from the useful (inflatable tailgate chairs with TWO cup holders) to the obnoxious (officially licensed vuvuzela/cowbell combo instruments for the football fan you already hate). We understand that email marketing is an effective tool to reach potential targeted customers. But we also know that a Wild West mentality toward email addresses doesn’t benefit anyone who wants to sell their product, services or even candidate when there’s a high level of competition for audience’s eyeballs.

With that thought in mind, I was extremely disappointed to come across a Minnesota Public Radio article about our state’s Data Practices Act and the lack of privacy for email addresses. The Minnesota Data Practices Act (DPA) deals specifically with access to government data and the presumption that government data is accessible to the people, much like a state level Freedom of Information Act. While I highly recommend everyone read and think about the article, the short summary is that an individual recently requested the email addresses of people from a number of cities who signed up to receive alerts about local government happenings. It revealed that based on the DPA, the information is considered public and cities are legally required to disclose the email addresses to the requestor. There’s only speculation as to why these email addresses have been requested. However, since the person asking for them is married to someone running for political office, campaigning is probably a safe assumption. But what if he wants to sell them? Or operate a very focused local phishing scam? Or in requesting all of those email addresses he is able to find the one he wants for other nefarious purposes? In this instance, I highly doubt that is the case. However, as Mat Honan discussed in Wired, it doesn’t take a whole lot of data for a pretty vicious hack to occur. If a previously undisclosed email address can be coupled with just a few other pieces of an individual’s data, a whole Pandora’s Box of private information can be opened up.

Now that I’ve gotten my scare tactics out of the way, this is really a question of state policy and its relationship to openness. One would hope our legislators would err on the side of caution when it comes to divulging people’s electronic information. That said, the reason we have the DPA is to prevent the government from hiding its doings from the public. Specifically then, legislators have chosen to exclude specific types of information as protected and then assume anything not explicitly protected is open for disclosure. That’s the rationale in this scenario, since personal email addresses aren’t excluded, they aren’t protected from DPA requests. Cities then have no choice but to comply with the DPA. So while I might wish the state would be judicious with access to personal data, there’s a very real reason the DPA supports the ability to disclose/supply more information rather than less. On one hand, it’s a question of privacy, on the other it really gets to the modern technology question of time, money, effort and accountability related to using government collected data. Let’s frame the problem this way. If I want to reach out to an entire community of people (say 5,000), there’s a cost associated with each attempt to contact every person. Be it making phone calls or the printing and postage expense to send a mail piece, there’s time, effort and money baked into each contact attempt. Email is a little different. If you are doing the deployment yourself, you have time and money spent on the software, designing the email and setting up the email list, but after that, costs drop significantly with each deployment. It’s a lot cheaper to send an email a day to a list for ninety days than it is to send a postcard daily over the same time period. It’s problematic. However, just because I want to get snow emergency notifications via email so my car isn’t towed, I don’t want to then expose my email address to any myriad of people with unknown other intentions.

While I certainly come down on the side of minimal disclosure when it comes to personal email addresses, there is some space for debate where it might be acceptable. I just can’t predict what that need might be–which is the crux of the problem where laws and policy lag behind technology.

If the courts haven’t decided if a Facebook “like” constitutes protected First Amendment speech, it’s easy to understand how complex it is to decide if signing up for snow emergency notifications or city council meeting agendas makes your email address public information. The solution isn’t simple and debate on the issue is essential to getting it right. However, in the meantime we shouldn’t just hand them out willy-nilly to anyone that asks. Unexpected benefits would be fantastic, but it hardly outweighs the inadvertent consequences that could come from disclosure.

Now if you’ll excuse me, I have to email a guy about the officially licensed Cleveland Browns mood rings (they are just a solid brown color designating sad resignation).