Big Data is one of THE biggest buzzwords around at the moment, and I believe big data will change the world. Some say it will be even bigger than the Internet. What’s certain is that big data will impact everyone’s life. Having said that, I also think that the term ‘big data’ is not very well defined and is, in fact, not well chosen. I would like to use this article to put a stake in the ground and define in simple terms what ‘big data’ is, and hope that with everyone’s help we can create a complete definition. So, here we go:
Big Data is one of THE biggest buzzwords around at the moment, and I believe big data will change the world. Some say it will be even bigger than the Internet. What’s certain is that big data will impact everyone’s life. Having said that, I also think that the term ‘big data’ is not very well defined and is, in fact, not well chosen. I would like to use this article to put a stake in the ground and define in simple terms what ‘big data’ is, and hope that with everyone’s help we can create a complete definition. So, here we go:
Big data refers to our ability to collect and analyze the vast amounts of data we are now generating in the world. The ability to harness the ever-expanding amounts of data is completely transforming our ability to understand the world and everything within it. The advances in analyzing big data allow us to, for example, decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts and prevent diseases.
Take this business example: Wal-Mart is able to take data from your past buying patterns, their internal stock information, your mobile phone location data, social media as well as external weather information and analyze all of this in seconds so it can send you a voucher for a BBQ cleaner to your phone – but only if you own a grill, the weather is nice and you currently are within a 3 miles radius of a Wal-Mart store that has the BBQ cleaner in stock. That’s scary stuff — but one step at a time, let’s first look at why we have so much more data than ever before.
In my talks and training sessions on big data I talk about the ‘datafication of the world’. This datafication is caused by a number of things including the adoption of social media, the digitalization of books, music and videos, the increasing use of the Internet as well as cheaper and better sensors that allow us to measure and track everything. Just think about it for a minute:
- When you were reading a book in the past, no external data was generated. If you now use a Kindle or Nook device, they track what you are reading, when you are reading it, how often you read it, how quickly you read it, and so on.
- When you were listening to CDs in the past no data was generated. Now we listen to Music on our iPhone or digital music player, and these devices are recording data on what we are listening to, when and how often, in what order, etc.
- Today, most of us carry smart phones and they are constantly collecting and generating data by logging our location, tracking our speed, monitoring what apps we are using as well as who we are ringing or texting.
- Sensors are increasingly used to monitor and capture everything from temperature to power consumption, from ocean movements to traffic flows, from dust bin collections to your heart rate. Your car is full of sensors and so are smart TVs, smart watches, smart fridges, etc. Take my new scale (which I – as a gadget freak – love!): it measures (and keeps a record of) my weight, my % body fat, my heart rate and even the air quality in our bedroom. When I step on the scale it automatically recognizes me, takes all the measurements and then sends them via Bluetooth to my iPhone, which gives me stats on how my Body Mass Index is changing and so forth. This information is then also synced with the data collected by my Up band, which tracks how many calories I have consumed and burnt in a day and how well I have slept at night.
- Finally, combine all this now with the billions of internet searches performed daily, the billions of status updates, wall posts, comments and likes generated on Facebook each day, the 400+ million tweets sent on Twitter per day and the 72 hours of video uploaded to YouTube every minute.
I am sure you are getting the point. The volume of data is growing at a frightening rate. Google’s executive chairman Eric Schmidt brings it to a point: “From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating.”
Not only do we have a lot of data, we also have a lot of different and new types of data: text, video, web search logs, sensor data, financial transactions and credit card payments etc. In the world of ‘Big Data’ we talk about the 4 Vs that characterize big data:
- Volume – the vast amounts of data generated every second
- Velocity – the speed at which new data is generated and moves around (credit card fraud detection is a good example where millions of transactions are checked for unusual patterns in almost real time)
- Variety – the increasingly different types of data (from financial data to social media feeds, from photos to sensor data, from video capture to voice recordings)
- Veracity – the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech)
So, we have a lot of data, in different formats, that is often fast moving and of varying quality – why would that change the world? The reason the world will change is that we now have the technology to bring all of this data together and analyze it.
In the past we had traditional database and analytics tools that couldn’t deal with extremely large, messy, unstructured and fast moving data. Without going into too much detail, we now have software like Hadoop and others which enable us to analyze large, messy and fast moving volumes of structured and unstructured data. It does it by breaking the task up between many different computers (which is a bit like how Google breaks up the computation of its search function). As a consequence of this, companies can now bring together these different and previously inaccessible data sources to generate impressive results. Let’s look at some real examples of how big data is used today to make a difference:
- The FBI is combining data from social media, CCTV cameras, phone calls and texts to track down criminals and predict the next terrorist attack.
- Facebook is using face recognition tools to compare the photos you have up-loaded with those of others to find potential friends of yours (see my post on how Facebook is exploiting your private information using big data tools).
- Politicians are using social media analytics to determine where they have to campaign the hardest to win the next election.
- Video analytics and sensor data of baseball and football games are used to improve performance of players and teams. For example, you can now buy a baseball with over 200 sensors in it that will give you detailed feedback on how to improve your game.
- Artists like Lady Gaga are using data of our listening preferences and sequences to determine the most popular playlist for her live gigs.
- Google’s self-driving car is analyzing a gigantic amount of data from sensor and cameras in real time to stay on the road safely.
- The GPS information on where our phone is and how fast it is moving is now used to provide live traffic up-dates.
- Companies are using sentiment analysis of Facebook and Twitter posts to determine and predict sales volume and brand equity.
- Supermarkets are combining their loyalty card data with social media information to detect and leverage changing buying patterns. For example, it is easy for retailers to predict that a woman is pregnant simply based on the changing buying patterns. This allows them to target pregnant women with promotions for baby related goods.
- A hospital unit that looks after premature and sick babies is generating a live steam of every heartbeat. It then analyses the data to identify patterns. Based on the analysis the system can now detect infections 24hrs before the baby would show any visible symptoms, which allows early intervention and treatment.
And these examples are just the beginning. Companies are barely starting to get to grips with the new world of big data. In conclusion then, big data will change the world. In terms of language I prefer to talk about the ‘datafication of the world’ in relation to the ever-growing amounts of data and ‘large-scale analytics’ (or simply ‘analytics’ because what is large now will be normal tomorrow) in relation to our ability to analyze and harness big data.
At the moment I am spending a lot of my time helping companies understand the massive potential as well as big threats of big data. I work with executive teams of companies spanning all sectors and sizes to help them develop strategies to harness big data and find each of these discussions and projects amazingly fascinating because they all open up new opportunities. Now I would love you to share your views… What’s missing from this definition? What would you add? Do you agree? Disagree? Please let me know…
And as always, feel free to connect via Twitter, LinkedIn or The Advanced Performance Institute
(big data defined / shutterstock)