A phenomenal shift has occurred over the past few years in the enterprise data world. The ubiquitous data warehouses – the foundation for business intelligence and data discovery for several decades – are now becoming obsolete due to the emergence of data lakes.
While both data warehouses and data lakes have their pros and cons, a new era of ‘smart’ data lakes based on semantic technology is emerging that can reduce the disadvantages of either, creating a clear path for the industry.
A phenomenal shift has occurred over the past few years in the enterprise data world. The ubiquitous data warehouses – the foundation for business intelligence and data discovery for several decades – are now becoming obsolete due to the emergence of data lakes.
While both data warehouses and data lakes have their pros and cons, a new era of ‘smart’ data lakes based on semantic technology is emerging that can reduce the disadvantages of either, creating a clear path for the industry.
Before the evolution of the data lake, data warehouses were the only reliable solution for producing analytical reports on enterprise data. The benefits of a well-implemented data warehouse include effective governance and security, high data quality, and consistent analytics performance over time.
The downside of a data warehouse is it requires traditional, complex tools, like those from Oracle, IBM and Microsoft that typically require expensive resources and infrastructure. Extensive preparation is also needed by skilled IT analysts to set up the warehouse, and there is very little flexibility in adapting to the rapidly changing business landscape. Many organizations also view data warehouses negatively due to their high failure rates and ongoing costs.
Due to the decreasing costs of data storage, enterprises began turning to data lakes as an alternative in recent years. Data lakes serve as large repositories of structured and unstructured data that many in the industry hoped could be accessed to extract value relatively quickly using big data tools.
However, like all first-generation solutions, data lakes presented some concerns as well. While data lakes enable business analysts to quickly and efficiently query unstructured data, the skills required of these analysts are scarce, often requiring extensive training. In addition, it’s difficult to have data lakes adhere to critical rules of data governance, such as maintaining the security, access control and integrity of enterprise data.
Because of these issues, interest in “smart” data lakes has risen in an attempt to secure the benefits of data warehouses and first-generation data lakes while reducing their negatives.
Smart data lakes use semantic graph query engines that link and contextualize huge volumes of diverse enterprise data to determine meaning and value. The graph models enable independent data discovery, analytics and visualization capabilities by users across all entities and relationships in the vast data lake repository.
The graph models of smart data lakes also eliminate the need for the extensive preparation required of data warehouses and the training required for effective analysis in first-generation data lakes. There is generally some upfront preparation still required by IT personnel for smart data lakes, but it pales in comparison to the scale of work required for data warehouses. In addition, the relative ease of working with graph-based models opens the door to a variety of business users querying the data, leading to a ‘democratization’ of big data discovery and analysis.
Semantic technologies also help maintain the necessary data governance needed for the long-term sustainability of data lakes. Organizations can implement access to data in accordance with governance protocols by specifying who can and cannot view data elements. With these model-driven governance and access controls, restrictions and permissions are as enforceable as if the data were siloed according to governance mandates. This empowers users to ask questions while retaining trust that the answers stem from high-quality, secure data.
With key benefits of smart data lakes in the chart below, smart data lakes are well-positioned to displace data warehousing as the de facto means of storing data and facilitating analytics and discovery.