Three articles in recent weeks have touched on an important issue related to Big Data and predictive analytics: sometimes, the results can be downright creepy. It's kind of like the "Uncanny Valley" in computer animation: the reason why the human characters in Pixar animations are cartoon-like and not human-like is because trying to make animated humans photorealistic generally results in uncomfortable reactions from the viewer. The animations might look realistic, but something in our animal brain knows something isn't quite right, and it's just ... creepy.

The same thing can happen where the rubber meets the road of Big Data and predictive analytics: when offers or suggestions are made to individuals. You've probably had an experience similar to mine: after searching the web for a hotel deal in Vegas, suddenly every ad that appeared next to the blogs and websites I regularly read was for a Vegas-related deal. Creeeepy. (And also not particularly useful: after that trip I had no particular plans to return to Vegas anytime soon, yet the ads kept coming.)

A similar tale was related in the New York Times over the weekend. In the story "How Companies Learn Your Secrets" (reg. req.), statistician Andrew Pole (working for the retailer Target) described how he'd created a predictive model to identify from shopping habits when a shopper was likely to be pregnant. When the father of a young Target shopper saw the baby-related coupons sent to his daughter, he was outraged:

“My daughter got this in the mail!” [said the father]. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

It turns out the daughter actually was pregnant at the time, unbeknownst to her father. The creepy aspect here: why should a corporation be able to know (or rather, infer) such a personal fact, when close family members do not? Intriguingly, Target solved this problem by mixing future offers to identified pregnant shoppers with unrelated coupons, say, for lawnmowers or wineglasses. By deliberately making the predictions worse, the response was better: "as long as we don’t spook her, it works", said Pole. Personally, I wish web-advertisers would do the same thing. Not only do I not care about Vegas hotels anymore, the absence of other ads precludes the serendipity of discovering other products I might actually like, but which my activity history might not suggest. In a follow-up interview, article author Charles Duhigg suggested other areas where this technique might help alleviate the "creep factor".

In a similar vein, this week's Esquire profiles Tibco's CEO Vivek Ranadivé. Amongst several examples of the importance of collecting multiple streams of data to improve predictions from analytics, comes this anecdote about football fans visiting Oakland's Oracle Arena:

At the end of the third quarter, when the computer system showed that the concession stand near his seats had too many hot dogs, it could send him a buy-one-get-one-free offer — because it also knows that he sometimes buys hot dogs at games.

The right information to the right people at the right time in the right context. (Fans creeped out by this could opt out.)

This may be another example where moving the predictions outside the uncanny valley might prevent fans being creeped out.

Finally, another New York Times article from earlier this month, "The Age of Big Data" (reg. req.) looks into the lives and impacts of some of the "rock stars" of Big Data applications. While lauding many of the benefits of analytics on Big Data, it also strikes a cautionary tale at the end of the article:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”

This is a great point: treating analytics as a "black box process" — data in, predictions out — can lead to inapproprate predictions (more to the "zombie" side than the "angel" side of the Uncanny Valley). It takes the statistical expertise of a data scientist to ensure that such predictive analytics are creating sensible predictions ... and to help companies avoid the Uncanny Valley of Big Data.

(Read more articles from this blog on big data and predictive analytics.)