It may come as no surprise that the internet has been swelling up with an increasing amount of data, so much so that it’s become difficult to keep track of. If in 2005 we were barely dealing with 0.1 zettabytes of data, this number is now just above 20 zettabytes and it is even estimated to reach a staggering 47 zettabytes by 2020. Apart from the sheer enormous quantity of it, the problem resides in the fact that it’s mostly unstructured. And there’s nothing more harmful for mankind than providing AI with incomplete or inaccurate data.
It seems that we are dealing with about only 10% of structured data, while the rest is just a great jumble of information that isn’t tagged and cannot be used in a constructive way by machines. For a better understanding on this subject, it’s good to know that email does not qualify as structured data, while anything such as a spreadsheet is considered to be tagged and can successfully be scanned by machines.
This may not seem that problematic, but we need to have clean and organized data if we expect AI to improve our lives in sectors such as healthcare, driverless cars, connected homes and so on. The irony is that we’ve become really good at creating content and data, but we haven’t yet figured out a way to accurately leverage it to serve our needs.
Data Scientists Are Also Struggling
It’s only natural that data science is one of the fields that gained a lot of ground across these past years, with more and more data scientists dedicating their lives to sort out the mess. However, a recent survey shows that contrary to popular opinion, data scientists spend a lot less time on building algorithms and mining data for patterns, but rather on doing this so-called digital janitorial work — cleaning and organizing data. As you can see, the numbers are certainly not in favor of a bright AI future.
Predictors of the impeding humankind wipe-out by AI have clearly not taken into consideration the fact that although machines can successfully replace the few data scientists that are actually mining data for patterns, they may not be able to replace the vast majority of scientists who devote most of their time to collecting, cleaning and organizing this data. Of course, it’s better to simply collect data in a more integral way straight from the get-go, rather than to allocate so much time and resources to ‘fix’ it retroactively. Fortunately, leaders in AI have slowly reached this understanding as well, using their skills and influence to redirect the path on which data science is headed — and implicitly with it, AI.
AI Is Good, But It’s Not Yet Human-good
We’ve all heard cases of machines which proved to be superhuman when faced with actual humans, such as the case when the best Go player in the world was defeated by Google’s AlphaGo AI. However, this only shows that AI can be capable of staggering results in niche tasks, but its overall capacity is still no match to human capabilities. There are lots of subtleties and logical steps that AI simply cannot deal with.
AI’s limitations are even more noticeable when it comes to dealing with financial filings and legalese. It’s the same issue here as it is everywhere else. As long as AI machines are not fed structured data, such as standardized contracts, they will get seriously confused. This means that for the time being, it’s still up to qualified data scientists to undo the mess.
Effective AI Is Possible Only When Everyone Works As a Team
Highly qualified data analysts are expensive to hire, making it further problematic to advance in this field. The key is to go through the collection and modeling phase armed with the technology that can streamline the process.
Another key aspect is the joint efforts of multiple departments to tackle and solve the issue that big data poses. Financial and technical experts need to join hands in order to correctly identify from the start the potential flaws in the data they collected. The way in which these experts tackle a problem should also be registered in order to be then successfully replicated by machines. The goal is to create quality assurance algorithms which can pinpoint modelled results that were connected to errors in the past. The more such models we are able to create, the less room there will be for data errors and irregularities.
AI Cannot Survive Without Big Data
Regardless of the direction AI is taking — if it’s good or bad for mankind — one thing is for sure: AI cannot go anywhere without big data. And we already have examples from our daily lives that we most likely take for granted, which prove how necessary AI was in their existence. Take for example Cortana or Siri. They are able to understand our questions and queries only because they’ve been fed endless amounts of information that helped them understand our natural language. Google has become this giant omniscient power that knows so much about each and every one of us, only as a result of our numerous daily entries on its search engine. To this end, companies are also able to make accurate reports — for example, those which can identify websites using revcontent, only thanks to the neatness with which that data was initially collected.
Since AI is so deeply connected to big data, it only makes sense that it has access to clean, structured data for it to process in a way that improves our lives. Fortunately, the world is gradually becoming more understanding of the needs behind AI advancements. This is why we are noticing an improvement in the way data scientists are served by their jobs in terms of funding, wages, tools and equipment available.
This awareness is slowly spreading across the globe, enabling companies and experts to cooperate with each other in order to collect data more efficiently, establish models that can further help machines clean and structure data and also set the groundwork for future generations to come. Knowing where the issues with AI and big data stem from means that the problem is halfway solved.