Big data involves interplay between different data management approaches and business intelligence and operational systems, which makes it imperative that all sources of business data be integrated efficiently and that organizations be able to easily adapt to new data types and sources.
Big data involves interplay between different data management approaches and business intelligence and operational systems, which makes it imperative that all sources of business data be integrated efficiently and that organizations be able to easily adapt to new data types and sources. Our recent big data benchmark research confirmed that big data storagetechnologies continue to follow many approaches, including appliances, Hadoop, and in-memory and specialized DBMSes. With the variety, velocity and volume of big data being part of today’s information architecture, and the potential for big data to be a source to feed other systems, integration should be a top priority.
Many organizations that have already deployed big data technology now struggle to access, transform, transport and load information using conventional technology. Even replication or migration of data from existing sources can be troublesome, requiring custom programming and manual processing, which are always a tax on resources and time. Barriers such as having data spread across too many applications and systems, which our benchmark research found in 67 percent of organizations, do not go away just because an organization is using big data technology; in fact, they get more complicated. However, big data also creates opportunities to use information to innovate and to improve business processes. To avoid the risks and take advantage of the opportunities, organizations need efficient processes and effective technology that makes information drawn from big data available to all people who need it.
Organizations need integration technology flexible enough to handle big data regardless of whether it originates in the enterprise or across the Internet. For this reason, tools for big data integration must be able to work with a range of underlying architectures and data technologies, including appliances, flat files, Hadoop, in-memory computing and conventional databases, and move data seamlessly between relational and non-relational structures. They must be able to adapt to events or streams of data, and they must harvest data from transactional systems and business applications in enterprise data warehouses. Supporting data quality and master data management needs is also part of supporting big data with data integration.
Selecting the right approach to big data integration is difficult when organizations lack knowledge of the functional requirements and best practices relevant to their industries, lines of business and IT. Deficiencies in existing software and data environments can further complicate the ability to choose wisely and so should be factored into the deployment decision-making process. Organizations must identify the types of integration being used or under consideration to handle data other than that formatted for relational databases, and evaluate processing capabilities and techniques to handle the proliferation of big data. IT professionals therefore must understand how to work with analysts and business management to deliver timely, benefit-based big data deployments.
IT should evaluate whether it can use existing skills to shorten the time it takes to get big data to users. Since our research has found lack of resources to be the top barrier to using innovative technology, according to 51 percent of organizations, businesses should make sure their IT staff does everything possible to maximize skills and resources internally and not waste them on custom, manual siloes of effort. Having the right data integration processes and data management methods can help IT work more efficiently and partner better with the business units.
Not having a dialogue about what information management competencies a business needs is a mistake. I have seen most IT industry analyst firms’ content deal with just a portion of the big data picture, discussing for example just the technologies for storing and accessing data, with a fixation on variety, velocity and volume. However, decision-makers must consider the efficient flow of data across its entire path of travel, from its origins to user systems, to ensure the effective functioning of any big data project. Failure to do that means failing to optimize information across its life-cycle for business value. Without the ability see the entire big data value chain, a business may find its initiatives exceed available limits of cost and time and damage a business case built on time-to-value metrics. According to our research, the most important benefits of big data technologies include retaining and analyzing more data (74%) and increasing the speed of analysis (70%). Organizations need to make sure they do not increase the number of manual processes they run and the time spent on them, thus impairing the value of big data.
We have begun research to assess the latest big data integration technologies and best practices to help advance these efforts, as we outlined in our research agenda on big data and information optimization for 2013. We will document emerging best practices in big data integration to meet business needs, from basic access and replication to transformational migration. Until we can share our results, be sure to consider big data integration as part of your business case and project, because it is essential to gaining the most value from your big data investments.