Many people love NoSQL databases because they can throw their data into them without a care (almost) and let the NoSQL database sort out the storage. Writes and reads of “lumps” of data can be blazingly fast, making them favorite tools for many developers seeking performance for their applications. But analytics? If you’ve thrown in your data just anyhow, how reasonable is it to expect any sort of orderly analysis afterward? Trends and patterns may be easier to obtain from an SQL database with its neatly structured tabular data. However, SQL-based reporting and NoSQL databases can be reconciled to produce effective analytical solutions.
Let’s look at the common reasons organizations move their data into MongoDB, and how modern data analytics tools can be used on top of these databases to provide actionable insights in a way that business users can understand and operate independently.
The Challenge of Gleaning Insights from Unstructured Data
Organizations have access to many new types of data and data sources today. Machine sensor data, social media content, and variable length text and video are a few examples. All of this data differs from traditional sources, e.g. financial or ERP data, in that it does not always follow a consistent structure or schema (certain fields may appear in certain records and not appear in others), and hence does not naturally lend itself to traditional tabular format.
However, this new data also reflects business reality in ways that conventional, structured data cannot. Naturally, business users want actionable insights for their business from this new data. In addition, non-technical users want to be able to query the data the way they want, i.e. query possibilities with flexibility and agility built-in – i.e., the type of functionality which modern business analytics and reporting software are supposed to provide.
NoSQL databases will continue to be part of the solution for massaging meaningful information out of the vast resources of unstructured data, because of their facility for capturing such data. In addition, developers who have seen how easy it is to develop applications on such databases will want to keep using them. The challenge is therefore to build suitable analytical solutions on top of NoSQL databases, rather than trying to shoehorn big data into SQL databases, even if analytical functionality is already available for the latter.
Data Format Differences Between NoSQL and SQL
If you come from an SQL and SQL analytics background, a key thing to understand about NoSQL databases is that they use store data differently, compared to SQL databases. For example, with SQL databases you jumped through hoops to group data items into separate tables according to the relationships they had with one another, then linked those tables together.
By comparison, many NoSQL databases have no such data normalization or organization. They operate on the principle of “store first, design later”, often essential for handling rapid floods of data in e-commerce, industrial applications, and others. Key value stores are the simplest kind of NoSQL database, in which every element is stored as a key (or attribute name), together with its value. Graph databases hold information about networks. Columnar stores organize data by columns instead of rows for faster querying of large datasets. Document databases hold each element with a key and a complex structure (the document) in which key-value pairs, key-array pairs, and nested structures are all possible.
Renormalization to Unlock Analytics
Lack of normalization makes it more difficult to apply conventional methods of preparing data for analysis, such as ETL (extract, transform, load), to NoSQL databases. Furthermore, the typical analytical operations of slicing, dicing, and aggregating data cannot immediately be applied to the nested data found in popular NoSQL databases like MongoDB. However, techniques such as data flattening (converting data into simple row and column formats) and table virtualization for handling arrays of data can be used to make NoSQL data easier to analyze. Essentially, NoSQL data can be transferred to an SQL style environment, where SQL requests can be used, preferably packaged in ways that are intuitive and user-friendly.
Advantages of NoSQL Analytics for Different Users
Given these challenges, one might ask – why bother with trying to run analytics on NoSQL data, rather than treat it as strictly as a repository for operational data? The answer is that with the right NoSQL analytical tools, business intelligence from unstructured data can be made available for a number of different types of needs and users and provide unique benefits:
- Casual business intelligence. Non-technical users and business analysts use this kind of BI on an as needed basis to answer ad hoc queries and build reports for one-off requirements. They depend on being able to express their questions and needs in a language that is reasonably close to natural business language.
- Super-user analytics. Super users may already be highly proficient spreadsheet program users, taming data for their particular business needs outside the IT department. Not afraid of more sophisticated commands, they know what they want. The right analytics over NoSQL give them power and flexibility, as well as an easy-to-use interface.
- Analytical modeling. This is the creation of complex data models to describe and predict situations and events at enterprise and departmental levels, a skill for which analytical modelers often have specific training and require appropriate tools and access to NoSQL data.
- Data science. On the cutting edge of data analysis, data scientists interpret unstructured data from web sites, the Internet of Things, email archives, or any other source of data of potential interest to an organization. They know enough to write their own custom applications to process data from NoSQL databases, but also use existing and standard analytics over NoSQL where appropriate to save time.
Handling MongoDB, the Most Popular NoSQL Database
Take the case of MongoDB, the most popular NoSQL database and also one of the most popular databases of any kind (NoSQL or SQL). As a document data store, MongoDB can hold big unstructured data for which it offers its own MongoDB query language. Those willing to learn this query language can use it to start building their analytics. However, many organizations would rather leverage their existing skills in the use of SQL or SQL-based reporting tools. To do this, the data needs to be transformed during the ETL process to generate a columnar, tabular view which can then be queried using BI tools.
A Standards-Based Database Connector to Make Life Easy
One solution is to use a connector to run SQL queries directly on MongoDB, such as the Sisense-MongoDB connector is an example. This connector offers a standard ODBC (Open DataBase Connectivity) interface to plug into any MongoDB source and run any SQL query. The connector performs the necessary renormalization of data and avoids the need to learn the MongoDB query language.
ODBC essentially enables the system to translate the non-tabular data stored in MongoDB into a tabular format by scanning the data and identifying common fields around which to structure the information. Thus, within the Sisense Elasticube (or similar analytical repository), the data is structured in familiar tabular format, which can then be used to combine MongoDB data with data from a wide variety of external sources.