If your business requires an enterprise-ready, consistent big data solution, read on to learn more about what a true enterprise-grade Hadoop and NoSQL platform can – and should – offer.
1. Fast Recovery
Something in your cluster will inevitably fail, whether it’s a disk, server, or power supply. With some Hadoop distributions, a failed node can result in 30 minutes or more of downtime to reassign the region. A Hadoop distribution enhanced with enterprise capabilities, on the other hand, should recover instantly to eliminate costly downtime.
2. Minimize NoSQL Administration
Deploying a NoSQL database in Hadoop typically requires significant administrative overhead. System maintenance requires manual operations that require the cluster to be temporarily unavailable. An enterprise-grade platform eliminates the manual operations required by other platforms by automating administrative operations, especially with regard to self-tuning.
3. Consistent Low Latency
Low latency is critical to the success of many production business applications, and an enterprise-grade solution will provide continuous low latency via optimizations that minimize housekeeping operations, such as garbage collection and compaction. Such operations bog down the system as they are run, thus affecting overall performance. The optimizations ensure the system doesn’t need to spend excessive cycles cleaning after itself. In addition, low disk I/O combined with a smaller disk footprint makes operations on disk faster and more predictable.
4. Full Data Protection
Certain enterprise-grade options offer full data protection for NoSQL applications via snapshots. Snapshots allow real-time recovery of all data — files and tables — in order to avoid user and application errors. Tables can be read and recovered directly from snapshots eliminating downtime of your production system.
5. Disaster Recovery
Production-ready Hadoop and NoSQL platforms need a good disaster recovery system in place. Mirroring/replication systems allow you to automatically copy differential data across clusters in real time. As part of the replication strategy, if databases do not have to be reconstructed, databases are brought up instantly on the replica site when the active site goes down. Mirroring could also be used to provide read-only access to certain files from multiple locations.
6. Run Multiple Jobs Simultaneously
In order to safely run multiple jobs in the same cluster, the core system needs to be isolated so that a runaway job won’t bring down an entire cluster. Features that optimize data placement and job placement control allow you to run jobs with different requirements without any conflict.
7. Real Time Data Flows
Data movement is a common and critical process within a production Hadoop cluster. One capability that facilitates this is the ability to mount the cluster as an NFS volume. This gives Hadoop a full read/write storage interface with which you can support multiple users and full random read and write. It also allows apps to load directly into the system and provides real-time access to results.
8. Flexible Security
Finally, a good solution should allow for flexibility in how security is set up, including leveraging the company’s existing security measures. An integration with Kerberos allows for this. A native authentication solution could also maintain key-based secure communication in Hadoop.
Overall, a true enterprise-grade Hadoop platform – such as the one offered by MapR – offers the performance, reliability, usability, consistency and security that businesses need when they are looking to get immediate, quantifiable benefits from big data.