Often I think about challenges that organizations face with “Big Data”. While Big Data is a generic and over-used term, what I am really referring to is an organization’s ability to disseminate, understand and ultimately benefit from increasing volumes of data. It is almost beyond question that in the future customers will be won/lost, competitive advantage will be gained/forfeited and businesses will succeed/fail based on their ability to leverage their data assets.
Often I think about challenges that organizations face with “Big Data”. While Big Data is a generic and over-used term, what I am really referring to is an organization’s ability to disseminate, understand and ultimately benefit from increasing volumes of data. It is almost beyond question that in the future customers will be won/lost, competitive advantage will be gained/forfeited and businesses will succeed/fail based on their ability to leverage their data assets.
It may be surprising what I think are the near term challenges. Largely I don’t think these are purely technical. There are enough wheels in motion now to almost guarantee that data accessibility will continue to improve at pace in-line with the increase in data volume. Sure, there will continue to be lots of interesting innovation with technology, but when organizations like Google are doing 10PB sorts on 8000 machines in just over 6 hours – we know the technical scope for Big Data exists and eventually will flow down to the masses, and such scale will likely be achievable by most organizations in the next decade.
Instead I think the core problem that needs to be addressed relates to people and skills. There are lots of technical engineers who can build distributed systems, orders of magnitude more who can operate them and fill them to the brim with captured data. But where I think we are lacking skills is with people who know what to do with the data. People who know how to make it actually useful. Sure, a BI industry exists today but I think this is currently more focused on the engineering challenges of providing an organization with faster/easier access to their existing knowledge rather than reaching out into the distance and discovering new knowledge.
The people with pure data analysis and knowledge discovery skills are much harder to find, and these are the people who are going to be front and center driving the big data revolution. People who you can give a few PB of data too and they can provide you back information, discoveries, trends, factoids, patterns, beautiful visualizations and needles you didn’t even know were in the haystack.
These are people who can make a real and significant impact on an organizations bottom line, or help solve some of the world’s problems when applied to R&D. Data Geeks are the people to be revered in the future and hopefully we see a steady increase in people wanting to grow up to be Data Scientists.