The primary advantages NoSQL has over RDBMSs is horizontal scaling and data format flexibility. In many ways, those advantages translate to speed, in terms of both system performance as well as time-to-value.
The primary advantages NoSQL has over RDBMSs is horizontal scaling and data format flexibility. In many ways, those advantages translate to speed, in terms of both system performance as well as time-to-value.
System performance (throughput and latency) is achieved by simplifying how data is retrieved, and time-to-value is achieved by bypassing a lot of data modeling effort that is typically done for relational databases, especially on data formats that have complexity or a lot of variation. Where NoSQL really comes into play is for use cases that are not ideal for RDBMSs. Any application that currently runs slower than you want or need, and does not require the core RDBMS capabilities—such as multi-row transactions, full SQL querying support, or integration with commercial applications—can likely use a NoSQL database.
There are four main types of NoSQL databases, plus one type of “database” that should also be considered in the mix. In this blog post, I’ll provide a brief description of these types of NoSQL databases and when they can be used.
Key-value. A key-value database is designed for storing, retrieving, and managing big blocks of data. It’s the most basic NoSQL model. This type of database is useful when the data is simply one cohesive value, such as binary large objects (BLOBs) or character large objects (CLOBs) in RDBMSs. The gain is the speed you get from the simple architecture of key-value stores.
Oftentimes, an application built on a key-values store will break up the value into specific parts. This means the application application developer is responsible for writing specific code to extract the individual data elements. While this typically is not a tough task, it gets more complex if the data elements vary greatly across records, forcing the application developer to keep track of what types of data elements are stored across records.
The big differentiator of this type is that several key-value databases on the market tout in-memory capabilities. As a result, for ultra-fast lookups with the trade-off of extra application code, it is a good choice.
Wide column. This type of database is often referred to as “columnar” or “column-oriented,” but these terms are incorrect and actually refer to completely different solutions. Research firm Gartner refers to wide column databases as “table-style,” which is a more appropriate name because they have rows and columns like the tables in RDBMSs.
The difference between wide column databases and RDBMSs is that the former are good at storing data in which columns differ across rows, allowing a data phenomenon known as “sparse data.” RDBMSs cannot efficiently store sparse data, because all possible columns have to be defined up front, and all columns necessarily take up disk space whether the cells are populated or not.
Unlike key-value stores, wide column databases have columns that are built into the data model, so the application developer does not have to write code to parse values into the separate data elements. These types of databases are popular when your data looks like RDBMS tables, and/or has structured data elements (such as messages, product catalog information, user profiles, etc.).
Some NoSQL experts use the term “key-value store” as an umbrella term for both of the two aforementioned NoSQL database types. If you’re interested in learning more about these types of databases, I recommend you read The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014.
Document. These databases store data in a hierarchical format that allows more complex data types. The term “document” is confusing because most of us think of documents as textual files. Database vendors extended the term to mean “groups of self-describing data elements.”
The use cases for these databases are similar to wide column, but have the added ability to handle hierarchical data. This is good for uses such as content (book chapters, sections, quotes, etc.).
Graph. Agraph database uses graph structures for semantic queries with nodes, edges and properties to represent and store data. As such, they are not as “general purpose” as the other types of databases. Graph database functionality can often be built on top of the other data stores, though native graph database stores exist. These databases are useful for identifying relationships, such as “friend of a friend.” But these relationships aren’t restricted to similar items, so for example, you can quickly query which web pages your customers are browsing and which products they’re buying to help create upsell/cross-sell recommendations.
Search Engines. Finally, the “sort of” database that we need to include on the list when discussing NoSQL is search engines. Most people who use the Web on any regular basis won’t need an explanation of what search engines are, and most database users wouldn’t consider search technologies to be in the NoSQL camp. But the NoSQL mostly applies, and when it comes to fast lookup, search engines are ideal for finding words in documents.
Search engines aren’t really for storing data, and therefore aren’t really databases. But they do provide a great complement to RDBMSs as well as to the other NoSQL databases.
I include search engines on this list, albeit as “half of a database,” because the underlying architectures of the core NoSQL databases and search engines are similar, and it’s possible in the near future that either the core NoSQL databases will incorporate search functionality, or that search engines will include full database functionality.
The unique use cases for graph databases and search engines are clear cut, but not so much for the other types of databases. The fact is you can use any of those types for almost any use case. So what’s the real difference? In practice, IT professionals choose NoSQL databases based on features and characteristics such as scalability, business continuity, maintenance effort, security controls, and Hadoop integration.
Conclusion. When thinking about your next application use case, first consider whether you need the core features of an RDBMS, and if not, make note of some of the key features and capabilities you need in your environment. Then use those as the starting point to narrow your selection of a NoSQL database. If you find scale to be an issue with your current database solution consider exploring MapR-DB, an In-Hadoop NoSQL database solution.