There’s no doubt in my mind that big data describes a real and novel phenomenon; unfortunately, there are also many existing and well-understood phenomena in the world of business intelligence and data warehousing that are getting sucked into marketing stories and, indeed, even into respectable articles about big data.
The recent McKinsey Quarterly article “Are you ready for the era of ‘big data’?” (registration required) opens with the following example: “The top marketing executive at a sizable US retailer recently [discovered that a] major competitor was steadily gaining market share across a range of profitable segments… [This] competitor had made massive investments in its ability to collect, integrate, and analyze data from each store and every sales unit and had used this ability to run myriad real-world experiments. At the same time, it had linked this information to suppliers’ databases, making it possible to adjust prices in real time, to reorder hot-selling items automatically, and to shift items from store to store easily. By constantly testing, bundling, synthesizing, and making information instantly available across the organization… the rival company had become a different, far nimbler type of business. What this executive team had witnessed first hand was the game-changing effects of big data“ [my emphasis].
With all due respect to the authors, I believe that anybody who has been involved in business intelligence over the past ten years will be underwhelmed by this story. It is almost entirely a scenario, and a common one, at that, describing a pervasive data warehousing implementation and operational BI excellence. I suspect that the reason this example was tagged as big data was because of the reference to running myriad real-world experiments. This is a behavior often associated with big data; however, on its own, it is generally not a sufficient characteristic.
The remainder of the article provides many interesting examples and possible consequences, both beneficial and cautionary, of using big data. For the business executive, it clearly whets the appetite. But, from an IT perspective, it misses a key aspect–a viable definition of what big data really is. This is hardly surprising; big data has reached the point on the hype curve where definitions are considered unnecessary. We all seem to have an assumed definition that neatly meets our needs, be it selling a product or initiating a project. Hear me clearly, though. Despite the hype, there is something real going on here. And it’s fundamentally about the underlying characteristics of the information involved; characteristics that differ significantly from the data we in IT have stored and used over the years.
I contend that there are four types of information that together make up big data:
1. Machine-generated data, such as RFID data, physical measurements and geolocation data, from monitoring devices
2. Computer log data, such as clickstreams
3. Textual social media information from sources such as Twitter and Facebook
4. Multimedia social and other information from the likes of Flickr and YouTube
They are as different from traditional transactional data (the mainstay of BI) as they are from one another. They have little in common, beyond their volume. How business extracts value from them and how IT processes them vary widely.
While closely related to traditional BI and data warehousing, big data projects require additional and often very different skills in business and IT. Their value is first to drive innovative change in business processes; only afterwards can their use become ongoing and operational. These are topics I’ll return to in the coming months. But, in the meantime, join me for my webinar “Big Data Drives Tomorrow’s Business Intelligence” on 25th October for further insights in this rapidly evolving area.