ImageThe past decades organisations have been working with relational databases to store their structured data. In the big data era however, these types of databases are not sufficient anymore. Although they made a huge difference in the database world and unlocked data for many applications, relational databases miss some important characteristics for the big data era.

NoSQL databases are the answer that solves many of these problems. It is a completely new way of thinking about databases. Although NoSQL was first mentioned in 1998 by Carlo Strozzi who used it to name his lightweight, open-source relational database that did not expose the standard SQL interface, it really became know in 2009. Since then the NoSQL movement has been growing rapidly and not surprisingly as these databases have some important benefits; they are schema less, fast, agile and they can work with non-relational distributed and unstructured data. The type of data your organisation nowadays typically collects.

Scalability, agility and flexibility are often noted as the most important features of NoSQL databases. Although it has not yet reached the hype of Hadoop, thanks to these features the NoSQL movement draws a lot of attention. There are different types of NoSQL databases and each focuses on different applications:

  • Document database;
  • Graph store;
  • Key-value stores;
  • Wide-column stores.

The amount of available NoSQL databases is growing rapidly and currently there are, as this website shows, over 150 of them. On of the more well known is MarkLogic and recently they announced MarkLogic 7, an Enterprise NoSQL that shows the vast possibilities of NoSQL databases for organisations. On October 15 they organised their European Summit, where they explained more about the possibilities.

The Enterprise NoSQL is a document-centric database that structures the data in a tree-structure. Every entity is a document that can have a different tree structure and these tree-structures can support any-structured data ranging from full-text data to geospatial data and anything in between. The Enterprise NoSQL indexes what it sees meaning it is capable of indexing words, phrases, stemmed words and phrases (meaning linguistic capabilities), the structure of the document, values and collections (how the data is organised) as well as security permissions (which role has access to what data). All this data is injected schema-less into the database and immediately made available for search.

Their latest product has some new features that show the vast possibilities of todays NoSQL databases and for the first time moves into the direction of semantics. They have included several indexes that can be seen as a built-in in-memory column store which can be used for very fast analytics. These include among others a range index (to perform queries based on data range, even if no range is included in the data), a geospatial index (to perform location based queries) and a Triple Index, which offers the capability to store and search semantic triples.

These so called ‘semantic triples’ are an important feature of the next generation databases and a pre-requisite for the semantic web. They are a way to represent information. This can be anything from the value of an item in a spreadsheet to the name of a person in a sentence.  Semantic triples enable relationships between pieces of data and are related more closely to the way humans think. If you combine them with Linked Open Data (facts that are freely available and in a form that is easily consumed by machines) or information from DBPedia (Wikipedia but in a structured format understandable by machines), these triples suddenly receive a meaning and give data the context required to be valuable in a semantic environment. As the founder of MarkLogic, Christopher Lindblad, explained during the summit: “Data is not information, what you have to do to get from data to information is add context.”

We humans add that context automatically and subconsciously, based on what we have learned in our lives. Computers cannot do that and they need context coming from different sources such as DBPedia. In a semantic environment this is a lot easier to achieve and the information becomes easier to find and a lot more valuable. A great example of this is the Knowledge Graph of Google, which was introduced in May 2012 and which I discussed earlier.

Semantic search allows you to perform a combination of queries ranging from text queries, document queries, range queries or pose a query that goes over multiple data sources. It can return results that might not even contain the exact term you used in a query, but which is very closely linked to what you are looking for and therefore still relevant.  It is a new way to find what you are looking for and in combination with Enterprise NoSQL the new way to understand and find corporate data.

During the summit quite a few examples where shown that showed the vast possibilities of this combination. First of all the BBC, they used MarkLogic during the Olympics and since a year they have also started to use the Enterprise NoSQL server for the iPlayer. The on-demand iPlayer is a service that has thousands of different programs available for viewers on almost any device, ranging from smartphone, tablet to desktop. All sorts of related data around the television shows are added to a search and that offers users of the iPlayer a very rich experience. The BBC created many different triples for their shows thereby being able to provide a lot of additional relevant information to the customers.

The same thing they do for their websites, where they can automatically build a webpage based on a news story. For example a story about a football player gets automatically and within milliseconds enriched with information about the club the player plays, information about recent matches as well as a league table. These semantic and dynamic publishing pages offer a lot of value to the users.

Another example is Newz, a Dutch publishing application developed by Dayon in corporation with 12 competitive newspapers in The Netherlands. Newz offers new startup companies the possibility to use the Newz tool to create new business models that focuses on using articles from the newspapers and thereby creating tailored services for a certain segment. Newz derives additional information from articles based on semantic technology. They specify different meta data from the article and based on the context in the article information is added (triples). This additional information enriches the article. As a result the additional context can also be used to find the right article for the right customer, delivering a tailored experience to the customer. Without the semantic technology, this would not have been possible.

Finally there was the example of how The Church of Latter-Day Saints (LDS) applies big data to deliver content to its 15 million followers. Mike Bower, Principal Architect for the LDS, explained that they have over 600 websites in more than 100 languages that cover over 18500 published documents. They process millions of transactions per day and they used the Enterprise NoSQL server to make all that information searchable and available for all its followers. In addition, they have created The Gospel Topic Explorer, which is also powered by MarkLogic and takes full advantage of the semantic triples discussed here. They use semantics to add relationships in the information available in order to improve search results. It includes over 6600 different connected topics, 23000 relationships and 67000 facts that can be explored to discover how they relate.

Today’s world is changing rapidly and new tools are being developed constantly that take full benefit of the new data opportunities and that are capable of dealing with vast amounts of unstructured data. One of these tools is (Enterprise) NoSQL database in combination with semantic search, which is a great way forward. These type of technologies can really help your organisation in building mission critical applications that make a difference and put your organisation ahead of the pack.