Most of us have heard about Apache Hadoop. Most of us have heard about cloud computing, but it seems that combining the two buzz words may be what brings big data analysis into the hands of small and medium-sized businesses that don’t have the resources to build up Hadoop infrastructure on their own. Why is this?
Most of us have heard about Apache Hadoop. Most of us have heard about cloud computing, but it seems that combining the two buzz words may be what brings big data analysis into the hands of small and medium-sized businesses that don’t have the resources to build up Hadoop infrastructure on their own. Why is this?
While Hadoop is a much more affordable solution than a traditional warehouse for storing large amounts of data, the hardware, operational costs and expertise to set it up and run it can still be significant as well as time consuming.
Cloud computing combined with Hadoop allows small businesses to use big data without having to purchase and manage the hardware themselves.
To illustrate this point, let’s take a look at Google’s cloud solution for Hadoop: Google Compute Engine running MapR Distribution for Hadoop.
1. Get Started Immediately
Google Compute Engine allows businesses to sign up and get set up within minutes. This means small business owners can compete on the same level of the big data playing field without the huge startup costs of purchasing hardware. In addition, since the system is enterprise-ready, business owners can start getting insights without making complicated configurations or code changes.
2. Record-Setting Speed
Google Compute Engine partnered with the enterprise Hadoop vendor, MapR, and together they beat the standing MinuteSort record. To set such a record, Google Compute Engine sorted 15 billion 100-byte records in 60 seconds. That is 1.5 trillion bytes in one minute.
3. Cost-Effective Scaling
Let’s say for a minute that a business already has a big data solution, but occasionally it is too small to run the computation the business needs. It is usually too costly and time-consuming to expand the solution for a temporary situation. That is when the scalability and flexibility of cloud computing becomes particularly valuable. The Google Cloud Platform offers per-minute billing and scaling to thousands of cores, so companies can run an extra large project for a few hours and only pay for the time spent rather than paying to expand their infrastructure rather than having to skip the project altogether.
For business leaders looking for a big data solution that is very cost-effective, enterprise-ready and scalable, it seems looking to the cloud may be the next frontier.