Big data can bring big benefits to businesses of any size. However as with any project, proper preparation and planning is essential. You’ll need to invest in some tools to do the job – collect, store and analyze data – to achieve the ultimate objective of gleaning insights which will lead to better decision-making and improved performance.
Big data can bring big benefits to businesses of any size. However as with any project, proper preparation and planning is essential. You’ll need to invest in some tools to do the job – collect, store and analyze data – to achieve the ultimate objective of gleaning insights which will lead to better decision-making and improved performance.
The good news is that this doesn’t have to cost the Earth – although as with most things in life, you usually get what you pay for. Open source (free) software exists for most of the essential tasks. And the distributed systems that are used by industry are designed to run on cheap, off-the-shelf hardware. The trade-off is that it will take some time and technical skill to get it set up and working the way you want. So unless you have the expertise (or are willing to spend time developing it) it might be worth paying for professional technical help, or “enterprise” versions of the software. These are generally customised versions of the free packages, designed to be easier to use, or specifically targeted at various industries.
So here’s my rundown of three areas where you will likely want to make investments in infrastructure – software or hardware to enable you to turn big data into insights and business growth:
Collecting the data.
Here there are two basic option:
- Internal data – information about your company’s performance, activities, wins and losses, which you can generate yourself. This includes sales records, customer databases, employee records and all the day-to-day data which can be captured. In sophisticated industrial operations this could include machine data allowing you to fine-tune the efficiency of your equipment. This will be self-generated using manual data input or dedicated devices such as sensors, RFID or cameras, to collect the data in real-time.
- External data – Information about the wide world outside your offices and plants, which you can turn to your advantage. This could either be self-generated, for example customer surveys, or collected from an external source – either paid or free. The data economy is booming and many companies exist purely to supply other companies with information. On the other hand there is also a huge amount of information out there which is available for free – most Governments these days make concerted efforts to make as much of their data available to the public free of charge. This can be a great source for information on everything from population to weather and crime statistics. Depending on human need or desire your company is set up to cater for, you will almost certainly find something useful.
Storing the data
The thing about big data is that sometimes it can be really big. Computer hard disks are still the storage medium of choice because at higher storage capacities, they offer unrivalled economy. Of course this means they need computers to be housed in, which in turn need a building with an electrical supply. Luckily as the need for businesses to store ever increasing amounts of data has increased, ingenuity has come to the rescue.
Distributed storage and cloud storage are two ways that have emerged for any business to be able to store the data needed for big data projects without investing in highly expensive dedicated systems and data warehouses to put them in.
Distributed storage is a method of using cheap, off-the-shelf components to rig up your own high-capacity storage solutions, which are then controlled by software which keeps track of where everything is, and finds it for you when you need it.
“Cloud storage” really just means that your data is stored, usually remotely, but connected to the internet, and accessible from anywhere where you can get online. In current business usage it tends to mean that you are paying a third party such as Google or Amazon, or one of thousands of smaller, dedicated cloud storage operations, to store it for you, online somewhere. So you don’t have to worry about physically holding onto it yourself at all.
Of course, if you are a small or medium sized company, it’s worth remembering that individual, non-distributed (i.e normal) computer hard disks are available at very high capacities for very little cost these days – for a lot of everyday enterprises, this may be all you need.
Analyzing the data
Right, so now you have a lot of data, but you don’t know what to do with it. The next step is to analyze it, and hope it contains whatever it is you think you need to meet the goal before you embarked on this mission (you did remember to set a goal, right?)
There are three basic steps here:
- Preparing the data – Identifying the data that is crucial to the task at hand, “cleaning” it to get rid of unnecessary background noise, and putting it into a format which is accessible to the software or people who need to understand it.
- Model building and validation – Adjusting variables and seeing how this impacts on the data. Then assessing how the changes you are making work towards achieving the goals you set yourself at the start.
- Drawing a conclusion – Assessing the insights you gleaned during step 2, and deciding what changes you are going to make.
Data mining software exists from vendors such as IBM, Oracle and Google which are designed to help you do all of this – turning raw data into predictive models. For example building customer profiles from sales data, or highlighting inefficiencies where money is leaking from your company. Google has BigQuery, which is designed to let anyone with a bit of data science knowledge run queries against vast datasets. And many startups are getting involved with this market and offer “big data for dummies” style solutions which claim to let you simply feed it with all of your data, and sit back while it highlights the most important insights, and suggests actions for you to take.
So there in a nutshell are the three essential elements you will have to think about if you’re planning on putting big data to work for your business.
As always, I hope you enjoyed my post.
You might also be interested in my new book: Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance
For more, please check out my other posts in The Big Data Guru column and feel free to connect with me via Twitter, LinkedIn, Facebook, Slideshare and The Advanced Performance Institute.