I’ve commented on a number of occasions that the software technology on which big data is based is rather primitive. After all, Hadoop and its associated zoo are little more than a framework and a set of software utilities to simplify writing and managing parallel-processing batch applications. Compare this to the long-standing prevalence of real-time transaction processing in the database world, relational or otherwise. NoSQL databases perhaps offer more novelty of thinking, especially where there has been innovation around the concept of key-value stores. At some fundamental level, big data has been less about “volume, velocity and variety”–marketing terms in many ways–and more about simple economics. The economics of cheap, commodity storage and processors combined with open sourcing of software development.
But, the big bandwagon has been rolling and many of us, myself included, have perhaps been too focused on the size and speed of the wagon and paid too little attention to the oxen pulling it. Oxen? Actually, I’m referring to the major web denizens, such as Google, Facebook and their ilk. What alerted me was a recent Wired magazine article, “Google Spans Entire Planet With GPS-Powered Database” and a trail of links therein, particularly “Google’s Dremel Makes Big Data Look Small”. Both articles, published in the two months, make fascinating reading, but the bottom line is that Google and, to some lesser extent, Facebook are upgrading their big data environments to be faster and more responsive. Unsurprisingly, Google is moving from a batch-oriented paradigm to, wait for it, a database system that preserves update consistency. Google has been on this journey for three years now and has been publishing research papers as far back as 2010. Get ready for a new set of buzzwords: Dremel, Caffeine, Pregel and Spanner from Google and Prism from Facebook.
So what does this mean for the rest of us? In the widespread adoption of the current version of big data technology, the driver has not been so much big data as the commoditization of processing power and computation that has emerged. Database vendors have reacted by embracing Hadoop as a complementary data source or store to their engines. The open sourcing of Dremel, if it happens, would signal, I believe, a much more significant change in the database market. Readers familiar with “The Innovator’s Dilemma” by Clayton Christensen, first published in 1997, will probably recognize that what would ensue as disruptive innovation, described as “innovation that creates a new market by applying a different set of values, which ultimately (and unexpectedly) overtakes an existing market”. To possibly overstretch the bandwagon analogy, it seems that the bandleader has switched horses; the parade is changing its route.
These developments add a whole new set of future considerations for vendors and implementers of big data solutions, and I’ll be exploring them further in speaking engagements in Europe in November: the IRM DW&BI Conference in London (5-7 Nov) and Big Data Deutschland in Frankfurt (20-21 Nov). I hope to meet at least a few of you there!
“Big wheel keep on turning / Proud Mary keep on burning / And we’re rolling, rolling / Rolling on the river” Creedence Clearwater Revival, 1969