Ensure the quality of data
Having large quantities of data is useless if that data is incorrect, inconsistent or used for the wrong purpose. Therefore it is vital that the right data is identified for the right problem and that the right attributes are measured. There are several aspects of data elements that require to be checked for quality. These include accuracy, completeness, consistency across data sources, uniqueness, reliability, structure, timeliness and accessibility.
Data quality is affected by the way data is entered, stored and managed. Therefore the quality of the data should be managed at the source, or at least as close as possible. Not ensuring the quality of the data can lead to reduced risk mitigation, agility and operational efficiency as well as an increase in the cost involved and big data projects that do not meet the expectations. Important is that data quality is not an IT-only issue, but as much a business-issue. Also the business needs to determine the rules and state when the data is of good quality and can be used. Bad data can seriously damage your big data project.
Ensure your data is anonymous
An important business use case of big data is to create a 360-degrees view of your customers in order to send personalized offers and services to them. However, creating such detailed profiles can be harmful and affect the privacy of your customers if not dealt with correctly. Big data incorporates the threat of consumers moving away from your organization if you deal with this incorrectly.
It is therefore vital that all the data is stored anonymously, especially when the data sets are made public. Always ensure that re-identification of individuals using anonymous data is impossible. Your organisation would do well to perform a threat analysis on the dataset prior to releasing it to the public; check for datasets that are available online that can be used to re-identify the people in the dataset.
Ensure juridical compliance of storing data
This is especially relevant when you use cloud solutions and where the data is stored in a different country or continent. Be aware that data stored in a different country has to oblige to the law in that country. This can have serious consequences if not thought through correctly. For example companies in Europe storing their data on American servers or on servers in Europe belonging to an American company fall under the Patriot Act of the USA.
Ensure ethical data compliance
Big data enables to check, control and know everything. But to know everything entails an obligation to respect that information as well. Such an obligation is that your organization should do everything possible to protect (sensitive) data sets and to be open and clear what is collected, what is done with that data and for what it is used. Big data ethics ensures that you only collect that data that is required and/or to be open to the customer about why certain data is collected and how it is used.
Ensure online and offline data security
In order to secure your data online it is important to start with identifying the types of sensitive data you have. Of course, some data is more sensitive than other data and that requires better protection. However, identifying and knowing where the sensitive bits reside is difficult, especially with very large volumes. Although low cost big data clusters can be attractive, they often provide little security beyond network and perimeter protection.
Securing the data online means also creating different roles and controlling data access. Root administrators need to be able to their job, without having to access the sensitive data for example. Define different classification for the data and the more sensitive the data, the fewer people should have access to it. When the data is in the cloud, hosted by a third party, there should be strict SLA’s about how the third party secures your data.
When you host the data on-premises in a dedicated warehouse, ensure that the area is only accessible by a few employees who need to perform the required tasks to keep the databases running. Preferably ensure that all data within the organization resides in a centralized data warehouse, as that is easier and cheaper to secure than several silos across the company.
As mentioned, developing a big data strategy is not an easy task and it is important to pay close attention to the data that you will use, how you will use that data and why you use the data. If done incorrectly, it could seriously backfire, something that you need to prevent at all costs. However, if you take care of this from the start and ensure that these five pre-conditions are part of your big data strategy from the beginning, it will help you scale your big data strategy in the future.
Copyright Big Data Startups 2013. You may share using our article tools. Please don’t cut articles from BigData-Startups.com and redistribute by email or post to the web.
image: big data/shutterstock