With its promise to transform data management and analytics by providing access to all data across the enterprise, the data lake is quickly becoming more than just an industry buzzword. Unlike traditional data warehouses, which have frequently resulted in lengthy implementation time, inflexibility and high costs, the data lake accommodates any type of data and stores it cheaply, in very large volumes, on commodity hardware.
With its promise to transform data management and analytics by providing access to all data across the enterprise, the data lake is quickly becoming more than just an industry buzzword. Unlike traditional data warehouses, which have frequently resulted in lengthy implementation time, inflexibility and high costs, the data lake accommodates any type of data and stores it cheaply, in very large volumes, on commodity hardware.
Business users and IT professionals alike have long been tasked with the challenge of preserving and collecting data from outdated computing systems, often referred to as legacy applications. Legacy applications that have exceeded their useful life can be expensive to maintain, often requiring dated versions of software and hardware to maintain support. Despite these challenges to the enterprise, legacy applications also contain valuable data that needs to be retained for business or compliance purposes.
Here are five key benefits of retiring applications to the Data Lake:
1. Data Preservation
By mapping the data to a business-friendly conceptual model, the data lake can preserve institutional knowledge of the meaning of data in legacy applications. The model is a high-level domain representation more easily understood by business users, eliminating the need for specialized application skills down the road.
2. Cost Efficiency
The emergence of the data lake brought promise of the ability to collect vast amounts of data in its native, untransformed format at a very low cost. The data lake provides easy-to-use mapping and ETL tools to migrate data from legacy applications to a low-cost, Hadoop (HDFS) storage environment.
3. Self-Service Workflow
With a data lake, end-users are provided critical capabilities including data cataloging, data meaning, data provenance and self-service data analytics via available data sets.
4. Convenient Accessibility
The availability of data stored in the data lake is virtually instantaneous, providing on-demand access to high-performing, in-memory query search and analytics capabilities across any legacy data set. As a result, the analytics capabilities, in many cases, have far exceeded those of the legacy application.
5. Enhanced Value
A vast majority of large organizations have realized that the information captured in the normal course of business has enormous strategic and competitive value. Though the data lake, data value is enhanced by making it easy to combine and analyze with other data sets. As a result, end users can combine and ask questions of the data – something not previously possible.
As more businesses begin to take notice of the value of big data, data lakes can serve as an ideal complement to low-cost, commodity cloud infrastructure for providing a retirement home for their legacy application data sets. By providing data that is more accessible to business users and easy to combine with other data sets, data lakes can also provide a return on investment that goes beyond just saving costs.