Big Data is here to stay and it is having a profound effect on businesses and societies. That having said; there are still so many organizations that have no clue about what Big Data is. Big Data means different things for different people, organizations and industries. While it is true that Big Data has different advantages and possibilities for different organizations and industries, the definition of Big Data can and should be the same for everyone. Especially because that would be beneficial for the acceptance, and therefore application, of Big Data, resulting in more innovation and economic growth.
Therefore, let’s dive a bit deeper in the meaning of Big Data and the different components of Big Data. As I have mentioned before, there are 7 V’s that describe and affect Big Data: Apart from Volume, Variety and Velocity these are Variability, Veracity, Visualization and of course Value. These V’s provide a guideline to what the different components of Big Data are and what the different aspects of a Big Data strategy are. Rather important when you want to start developing a Big Data strategy for your organization.
A shared understanding of what Big Data is and what it can do for you, regardless of the type of organization or industry that you operate in, is vital for the success of a Big Data strategy. The fact that there are many different definitions present on the web does not make things easier. A short overview of the different definitions:
Wikipedia says: “Big Data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications.”
Microsoft says: “Big Data is the term increasingly used to describe the process of applying serious computing power – the latest in machine learning and artificial intelligence – to seriously massive and often highly complex sets of information.”
Mayer-Schönberger & Cuckier say: “Big Data refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it.”
IBM says: “Big Data is being generated by everything around us at all times. Every digital process and social media exchange produce it. Systems, sensors and mobile devices transmit it. Big data is arriving from multiple sources at an alarming velocity, volume and variety.”
And there are countless more definitions as this overview shows you. The question then is of course, why another definition? Because most of the definitions that I have seen are misleading and do not contribute in ensuring more organizations start to develop a Big Data strategy.
Almost all definitions focus on the volume part of Big Data and while we are indeed living in an era that more data is being created every day, there are very few organizations that deal with Exabytes or let alone Petabytes of data. The result is that many organizations ask themselves the question: Why should I develop a Big Data strategy, because I do not have so much data?
Therefore, the term Big Data should focus a lot more on the variety aspect of it, and not the volume. I like to call this Mixed Data as the combination of different data sources, internal or external, are what provides the best insights, whether real-time or not, and you do not require massive amounts of data to achieve that. There are ample examples of organizations achieving fascinating insights by combining data sources and this can also achieved by small and medium enterprises.
It is not about the volume of data, but it is all the insights derived from combining several, smaller, datasets, making Big Data achievable for organizations of any size or in any industry. Therefore, the simplest explanation of Big Data is: “Mixed Data”