Why the Hoopla over Hadoop?
Hadoop in nine easy-to-understand facts.
Data management professionals would have to have been hiding under a pretty big rock for the past year or so to avoid hearing about Hadoop, a framework that’s being used for a wide range of data management functions. As IT Business Edge’s Arthur Cole pointed out earlier this month, Hadoop’s ability to share applications across multiple nodes makes it a prime candidate for all manner of clustered and distributed architectures, including the cloud.
His post focused on how to optimize infrastructures for Hadoop and vice versa. Similarly, IT Business Edge’s Mike Vizard in August wrote about a solution from Cloudera and Dell that bundles Cloudera’s Apache distribution of Hadoop with IT automation tools from OpsCode and an OpenStack installation tool from Dell called Crowbar on a Dell PowerEdge C Series server, with the aim of reducing the complexities of Hadoop deployment for IT organizations.
Also earlier this month IT Business Edge’s Loraine Lawson wrote about Hortonworks, a spin-off company created from Yahoo’s former Hadoop group and one of several companies with a business model based on offering support and training for Hadoop. Others Loraine mentioned in her post include Map R Technologies, Cloudera (again) and Revolution Analytics, which focuses on R as a means of applying analytics to Hadoop stores.
And IT Business Edge’s Susan Hall discussed how Hadoop skills are an increasingly hot commodity among IT organizatons, because they are in such short supply.
So it might be a little like putting the proverbial cart before the horse to start worrying about making Hadoop more accessible to business professionals who want to use it to analyze large amounts of data. I’d argue, however, that a late and lukewarm focus on usability is a big part of why so many business users don’t like enterprise applications. (While Google’s search algorithms are rightfully considered amazing, its simple interface was the key to its rapid and enthusiastic adoption.)
RapidMiner, an open source data mining and data analysis system, is apparently putting some thought into end-user accessibility for Hadoop-driven data analytics. At a recent RapidMiner user conference, data scientist Zoltán Prekopcsák demonstrated Radoop, a RapidMiner extension for editing and running ETL, data analytics and machine learning processes over Hadoop. Radoop incorporates the capabilities of Hive, a distributed data warehouse framework based on Hadoop, and Mahout, a data analytics framework based on Hadoop, into a large-scale analytics tool.
But what struck me most about the presentation, which you can view by following the link in the above paragraph, is the Radoop interface. I suspect not only business analysts but perhaps even savvy “regular” users could get comfortable with it pretty quickly. That kind of comfort is going to become a necessity for companies trying to address a shortage in data analysis skills. You can also use the link to apply to participate in a restricted beta of Radoop.