On Sharing Data - SmartData Collective

While security and privacy issues prevent sensitive data from being shared (e.g., customer data containing personal financial information or patient data containing personal health information), do you have access to data that would be more valuable if you shared it with the rest of your organization—or perhaps the rest of the world?

We are all familiar with the opposite of data sharing within an organization—data silos. Somewhat ironically, many data silos start with data that was designed to be shared with the entire organization (e.g., from an enterprise data warehouse), but was then replicated and customized in order to satisfy the particular needs of a tactical project or strategic initiative. This customized data often becomes obsolesced after the conclusion (or abandonment) of its project or initiative.

Data silos are usually denounced as evil, but the real question is whether the data hoarded within a silo is sharable—is it something usable by the rest of the organization, which may be redundantly storing and maintaining their own private copies of the same data, or are the contents of the data silo something only one business unit uses (or is allowed to access in the case of sensitive data).

Most people decry data silos as the bane of successful enterprise data management—until you expand the scope of data beyond the walls of the organization, where the enterprise’s single version of the truth becomes a cherished data asset (i.e., an organizational super silo) intentionally siloed from the versions of the truth maintained within other organizations, especially competitors.

We need to stop needlessly replicating and customizing data—and start reusing and sharing data.

Historically, replication and customization had two primary causes:

Limitations in technology (storage, access speed, processing speed, and a truly sharable infrastructure like the Internet) meant that the only option was to create and maintain an internal copy of all data.

Proprietary formats and customized (and also proprietary) versions of common data was viewed as a competitive differentiation—even before the recent dawn of the realization that data is a corporate asset.

Hoarding data in a proprietary format and viewing “our private knowledge is our power” must be replaced with shared data in an open format and viewing “our shared knowledge empowers us all.”

This is an easier mantra to recite than it is to realize, not only within an organization or industry, but even more so across organizations and industries. However, one of the major paradigm shifts of 21st century data management is making more data publicly available, following open standards (such as MIKE2.0) and using unambiguous definitions so data can be easily understood and reused.

Of course, data privacy still requires sensitive data not be shared without consent, and competitive differentiation still requires intellectual property not be shared outside the organization. But this still leaves a vast amount of data, which if shared, could benefit our increasingly hyper-connected world where most of the boundaries that used to separate us are becoming more virtual every day. Some examples of this were made in the recent blog post shared by Henrik Liliendahl Sørensen about Winning by Sharing Data.