A lot has been written recently criticizing Goolge’s Flu Trends – a flu tracker service that predicts flu activity based on specific search terms using aggregated Google search data and estimates current flu activity around the world in near real-time.
A lot has been written recently criticizing Goolge’s Flu Trends – a flu tracker service that predicts flu activity based on specific search terms using aggregated Google search data and estimates current flu activity around the world in near real-time. For more, read How does this work?
Science magazine has recently published an article titled “The Parable of Google Flu: Traps in Big Data Analysis” and Steve Lohr has published a great piece in BITS blog of New York Times titled “Google Flu Trends: The Limits of Big Data.”
It is important to note that over-estimation of flu activity in Google Flu Trends is NOT a limitation of Big Data or Analytics used for estimating the flu activity as some of the writers have suggested. Rather, it highlights importance of fourth “V” of Big Data – Veracity.
It is often mentioned that Big Data has three defining attributes – three Vs as they are called, namely Data Volume, Data Variety and Data Velocity. (for more, check out TDWI Best Practices Report titled Big Data Analytics). But this definition of Big Data misses a very important dimension or element of Big Data, namely Data Veracity.
I think Google Flu Trends estimates will be much more realistic if we were to incorporate Data Veracity, the fourth dimension of Big Data into estimation models and adjust estimates based on “Veracity Score”.
In other words, inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics, rather we need to incorporate the Data Veracity element into the estimation model.
What do you think? Do you agree that inaccurate estimates of flu activity as reported by Google Flu Trends is NOT a limitation of Big Data or Analytics?