Business intelligence (BI) has long been associated with relational databases and the SQL language. From the earliest days of data warehousing, the qualities of the relational model have been highly valued in th
Business intelligence (BI) has long been associated with relational databases and the SQL language. From the earliest days of data warehousing, the qualities of the relational model have been highly valued in the quest for data consistency and quality. In addition, it was assumed that business users are comfortable with tables of information. This has been proven true, especially by spreadsheets, much to IT’s chagrin. Tables are also the lingua franca of BI tools and simple Select / Where queries are familiar to many users. But, whatever the rationale, the association of BI and SQL is deeply embedded in the minds of most practitioners. So, the question arises–what about NoSQL; how does this relate to BI? Can it be of use in data warehousing?
Good questions. But first, you need to know what flavor of NoSQL you’re speaking about. For brevity, I’ll focus only on one of the five or so varieties: document-oriented data stores. (If you are interested in the others, the bigger picture–and a trip to Rome–I propose my two-day seminar there on 11-12 June!) As I discovered about a year ago in a fascinating conversation with Max Schireson, president of 10gen / MongoDB, in this context a document is neither about e-mail contents nor Word documents; it refers to a particular data structure where records consist of an arbitrary set of fields, each identified by a name and value pair, structured in JSON (JavaScript Object Notation) or similar language. For more details, refer to my white paper. So, let me release you from your suspense now. Can this be of use in BI? The short answer is yes. But to fully grasp the extent, I’d like to introduce you to two MongoDB customers and how they are easing into BI using NoSQL.
I spoke to David Chancogne, CTO of Traackr, a web business measuring the influence of people who blog, tweet and otherwise contribute to the impression the general public forms of brands, products and more on the web. The goal is to assist marketers and advertizing agencies track and target such influencers more effectively. Traackr has built a MongoDB database of the contents of blogs, tweets, etc. and gives its customers reports and analyses of the top influencers in their areas of interest. Is this BI? In its broadest sense, yes. The scope is very specific and the queries pre-defined, but this is still BI at its most basic. Did Chancogne think of it as BI? Actually not, it’s simply his business to provide analytics to his customers. Probing a little deeper, I discovered that Traackr is continually trying to optimize its algorithm to rate influence. They do this by extracting data from their database and playing with it in–wait for it—Excel! More BI, but like many a start-up business before them, the choice of Excel was more through familiarity and ease-of-use. Generic BI tools that run against a JSON data store, such as Pentaho’s NoSQL solution, Nucleon Software’s BI Studio, are beginning to appear that allow generic querying on the data without extracting it to Excel.
A conversation with Julian Browne led to further interesting insights. Browne is the architect of Priority Moments (a location-aware customer loyalty program that offers discounts at affiliated retailers) at O2, the second-largest provider of mobile/cell phone services in the UK, with more than 20 million customers. MongoDB was chosen as the platform for this service largely to deal with the complexity and variability of their product catalog. The challenge is that there exists a bewildering variety of product sets that can be offered to different customers, and changes constantly at the whim of marketing. The absence of a predefined schema, a key characteristic of document-oriented data stores, was a compelling argument for the technology choice. But, what of BI? Customer loyalty programs are prime BI territory, of course, and in this case tracking of uptake of offers is vital. As with Traackr, initial BI was provided through hand-crafted Java programming, although there is growing interest in using the emerging BI tools. Of more interest, however, is the experimental use of a specific feature of the database that allows a query to be left open and as records arrive in the database, they automatically appear in the result, which can be routed to a live HTML5 graph(1) giving real-time feedback to monitor program activity.
How would we summarize the situation regarding BI for document-oriented NoSQL databases? What we see is a fairly recent database technology with its query facilities being used for basic, predefined BI. As might be expected, more generic tooling for building queries is appearing. The type of BI supported is focused, application-specific querying and reporting–the type associated with data marts in traditional BI. This is exactly as we saw in the emergence of BI against relational databases. Note that some of the querying is being performed against the live operational sources. Again, we see the similarity with early reporting approaches with similar concerns about performance impacts on operations. MongoDB addresses this through the creation of eventually consistent replicas. Nonetheless, the demand for real-time BI continues to grow and certain classes of operational analytics will need such real-time or near real-time access.
Where NoSQL does not play a role in BI is also important. Enterprise data warehouses (EDW), with their focus on creating consistent, integrated, historical stores of core business information are set to remain squarely in the relational database world. But, where operational needs drive the choice of a NoSQL document-oriented data store, it is clear that BI can flourish in this environment too. See my latest white paper, “Business Intelligence–NoSQL… No Problem”, for further details.
(1) For background on this approach, see hummingbird and data-driven documents.