Social media analytics - you want the truth? It’s messy, it’s awkward, and the results should be reviewed on an ongoing basis by people with good training and very suspicious minds. But it’s still useful.

You could remove the words “social media analytics” from that last paragraph and substitute “medicine”, “the legal system” or [insert your profession here] and it would still work, wouldn’t it? Because in real life, messy solutions are often the best solutions we have. Ours is not a text book world.

Lines like, “You can’t handle the truth!” only work in movies. Come to think of it, that didn’t work in the movie, either. People want answers. But now that the truth is right in front of you, how will you handle it?

Why is social media analytics messy and awkward? For one thing, it’s the model of Big Data, and Big Data is certain to be… big. Massive quantity poses data management challenges. And then there’s quality. To put it simply, all measurements on the web are approximate. Things that make the web work – like caching, for instance – sometimes make it difficult to track and measure activity. Of course, that’s not the only kind of data quality problem in social media.

The demographics summary for any web content with a minimum age requirement shows that everyone downloading that content meets the minimum age requirement. These sites have registration or entry processes which, in short, say, “Hey, are you old enough?” The user answers, “Of course I’m old enough!” and may even provide a birthdate of the proper vintage. The report reflects what users tell us. Now, if you are not too sensitive for such things, please go read the comments on some adult video content and see if you believe all that stuff was written by people over 18. Much of the data in social media is self-reported, and self-reported data is open to quality problems.

Some data can be validated. On social media sites where real names are used, identity is validated by connection to others. But not all connections represent validation – some people connect based on what they see in the profile, not real-life familiarity. And not all the data is open to validation. Many people do not display their age, for example, in profiles. Who’s to know if such data is valid or not? Even if the data is displayed, who would report a friend for trimming off a couple of years – or adding them?

Is there value in this mass of dirty data? Yes there is. Do you first have to get the dataset into squeaky clean shape to extract value from it? Not necessarily. Let’s make this clear – it’s worthwhile to prevent data quality problems and correct problems when you can. But if all you see is what’s dirty in the data, you may be focusing on the wrong stuff.

Online, actions speak louder than… anything.

If you’re still hung up on demographics, consider that Todd Curry, CDO of Geomomentum, reported at the Math Men panel discussion in Chicago last summer, that an audit of audience data revealed that 40% of soccer moms were male, and 50% of seniors were under 50.

Your logs can’t tell you if I am really a woman, or whether I was born when I say I was born. But they can tell you what I do on your social media site. And here is one of the great advantages that social media enjoys over other types of web activity: because users must be logged in to use social media sites, the data recording user actions is some of the cleanest and most complete data in the online world.

Seriously, what do you care about gender, age or income? Those things are just proxies for what you really want to know – what people do, or what they are likely to do. You can invest a lifetime trying to clean up the demographics in social media data, or you can let your competitors waste their time on that while you concentrate on actions. Go straight to the best data you have, see what people are doing, and use analytics with this, your best data, to discover what predictive value you can find for actions that matter to you.

©2011 Meta S. Brown