Business intelligence is the buzzword making the rounds in corporate circles. To achieve said intelligence, algorithms and predictive analytics are employed, and for that big data is a prerequisite. In this day and age, where literally everything is measured and monitored, there are vast quantities of data generated which can be used in many beneficial ways.
The difficulty arises not only in deciding how to analyse the data for useful insights, but rather how to protect the information. The data dealt with could be sensitive and cause problems for companies if unwittingly divulged. The goal of securing big data was undertaken by the Cloud Security Alliance (CSA) recently, with the release of The Big Data Security and Privacy Handbook comprising useful tips for data storage, encryption, governance, monitoring and security.
Herewith are five practices that can be undertaken to secure big data.
Secure distributed programming frameworks
Distributed programming systems are popular with those utilizing big data. These frameworks are essentially pooled data connected to various networked computers or nodes for developers to use as part of programming models. This works for big data because it gives analysts access to large amounts of data from various sources and allows easy creation of a computational pipeline which is a necessity when setting up algorithms. Examples of such systems are Hadoop, MapReduce and Spark.
With all the sharing and distribution within these frameworks, there is a serious risk of leakage as well as information from untrusted mappers which results in erroneous results. A recommendation from the CSA involves verifying trust through channels such as Kerberos Authentication, and ensuring security policies are adhered to by all nodes.
De-identifying data by decoupling all personally identifiable information will protect the privacy of those involved. It is imperative to ensure files are access controlled to prevent leakage of information. This can be achieved using mandatory access control, which can be performed with various software tools.
To keep data secure, regular maintenance is required, checking all nodes periodically and screening for fake nodes or duplicate data.
Endpoint filtering and validation
Safeguarding the endpoint is vital to big data security. The first step is to only use trusted certificates and testing resources prior to utilisation. Another way to secure the network is to employ a mobile device management solution which prevents dissemination of information by providing the ability to locate, lock and wipe lost devices. Additionally, this tool can prevent unauthorised copying from company data.
Techniques to detect outliers and statistical similarities are used to filter malicious content and validate data, preventing various nefarious cyber-attacks which use multiple identities and duplicate data.
Data privacy
Maintaining data privacy on this scale is a difficult proposition. Differential privacy is recommended by the CSA. This method minimizes the chance of record identification, while maintaining query accuracy. In addition to this, homomorphic encryption should be used to store and process information in the cloud. This advancement allows computation to be performed without decrypting the data, thus also allowing outsourced vendors to deal with data successfully without revealing private information.
Beyond these security measures, employees need to be made aware of privacy policies and authorisation regulations. It is also suggested that privacy-preserving data composition be implemented. This controls leakage from various databases by way of monitoring the arrangements and links connecting the databases.
Big data encryption
There are many advanced cryptography models available and many of them now allow running searches on encrypted information. The CSA advises using a variety of cryptographic methods to protect big data.
There is relational encryption which allows comparison of encrypted data via data signal boosters without divulging encryption keys, as well as identity-based encryption which enables encryption for a given identity. Attribute-based encryption has the power to integrate access control into the encryption package and lastly, converged encryption utilizes encryption keys to aid the identification of duplicate data.
Audit the system
Auditing big data security is critical to maintain a safe environment. This is particularly true after a cyber-attack. The audit trail is followed to assess accessibility to information and security controls in place. It is imperative to store this audit data separately to prevent it skewing big data. There are various open source audit software protocols available and this facilitates the audit process.
Big data can pose big problems unless the correct strategies and techniques are employed to secure the data adequately. It is crucial to implement a comprehensive security scheme to protect data at every facet of the business big data pipeline to ensure that the data used for making business decisions is true, accurate and safe.