Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: A Million Monkeys Demonstrate the Power of Hadoop
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > IT > Cloud Computing > A Million Monkeys Demonstrate the Power of Hadoop
AnalyticsCloud ComputingHadoop

A Million Monkeys Demonstrate the Power of Hadoop

AlexOlesker
AlexOlesker
0 Min Read
SHARE

There are many great use cases for Apache Hadoop, the open source framework for scalable, reliable, and distributed computing on commodity hardware built around

There are many great use cases for Apache Hadoop, the open source framework for scalable, reliable, and distributed computing on commodity hardware built around Hadoop Distributed File System and MapReduce, such as delivering search engine results, sequencing genomes, and indexing entire libraries of text, but the Million Monkeys Project by Jesse Anderson may be the easiest to understand and the most fun.

The project was inspired by the Infinite Monkey Theorem which, in the simplest and most popular terms, states that a million monkeys with a million typewriters will, by randomly hitting the keys, eventually recreate the works of Shakespeare. The idea is that, though at any given instance the chance of a monkey typing a sonnet is essentially zero, with infinite instances it becomes almost certain. Anderson wanted to try this for himself but he didn’t have a million monkeys, a million typewriters, and infinite time and resources, so instead he used his home computer, Amazon’s Elastic Compute Cloud, and Hadoop to achieve the same results.

Anderson first generated a million virtual monkeys on Amazon’s EC2, which were really pseudo random number generators that would provide strings of 9 random characters. Anderson had to find a very efficient and reliable pseudo random number generator because at that scale, creating the strings was one of the most computationally expensive steps in the process, and he eventually settled on  Sean Luke’s Mersenne Twister. Next, he compared the generated string to the entirety of Shakespeare’s work and, if he found the string anywhere, he would mark it in almost real time, creating what he calls “performance art with monkeys and computers.” Comparing a 9 character string with every continuous set of 9 letters in all of William Shakespeare’s 38 works is no small task, and Anderson used a Bloom Filter to reduce CPU usage by 20-30%. A Bloom Filter works by creating a hash of the monkey’s string and comparing it to a file with all of the hashes and offsets of Shakespeare. Since hashes are shorter and simpler than the strings, this goes much faster but, because more than one string can result in a given hash, just because the hashes match doesn’t mean the strings will. If a match is found, the strings are then compared character by character.

More Read

Hadoop vs Spark
Big Data New Age: Hadoop vs Spark
Business Analytics and Hollywood: A Match Made in Heaven?
The Awe of Big Data
The Enterprise Graph – From Connections To Customer Insights
Applications That Are Social

The project took 1.5 months, generated 7.5 trillion character groups, and checked them against 5.5 trillion (5,429,503,678,976) possible combinations.  The project was concluded on October 6 when the last work, The Taming Of The Shrew, was completed. Normally, such a massive task would be out of the reach of one man without a team of computer scientists and supercomputers, but because Hadoop was able to break the overwhelming job into little segments running in parallel on servers in Amazon’s cloud, Jesse Anderson managed to do it himself on commodity hardware. Though the Million Monkeys Project was mostly for fun, it shares many similarities to other serious use cases for Hadoop. DNA sequencing, for example, involves matching short reads of a few dozen pairs to a full genome of millions or billions. of pairs Just like with the monkeys, the job gets much more manageable when broken down into smaller segments and, since commodity hardware and open source software spare research budgets, Hadoop has become a dominant tool in the sequencing community.

TAGGED:amazon ec2
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data migration risk prevention
Best Approach to Risk Management for Data Migration in Data-Driven Businesses
Big Data Data Management Exclusive Risk Management
AI in branding
How Data Analytics and Data Mining Strengthen Brand Identity Services
Big Data Exclusive
Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

How to Score 300,000,000 Customer Records for $3

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?