There’s a lot of talk about advanced analytics these days – the use of data mining and predictive analytics is growing rapidly so lots of articles, books (like Tom Davenport’s latest) and blog posts are being written.
There’s a lot of talk about advanced analytics these days – the use of data mining and predictive analytics is growing rapidly so lots of articles, books (like Tom Davenport’s latest) and blog posts are being written. One of these was by Jeff Kelly over on TechTarget on Data analytics team’s needs and, while I agree with some of what was said, I am going to take issue with the idea that analytics is a cottage industry. There is a feeling that, because what analysts do is complex and hard for others to understand they should be allowed to swan around picking their own tools while being give lots of autonomy and plenty of freedom to experiment. This is, I believe, a very dangerous idea. It is time for organizations to take a stand and industrialize their advanced analytics efforts.
Data analysts really have to get over the “choose my own tool” thing. Allowing each analyst to pick their data mining or analytic tool results in lots of different tools being used. This means that common data cleansing routines or model elements are not, in fact, common. It means that any kind of collaboration between multiple analysts is problematic, because they can’t put their models into a common repository. And, most importantly, it means that operationalizing those models will be massively more complex.
This last is crucial as operationalization is key to generating business value. Modeling teams regularly find that 50-60% of the models that work, that would improve results if they were deployed, don’t make it into production. This means all that work was wasted and that business results are unnecessarily poor – bad for everyone. Organizations need to understand how you are going to get advanced analytic models into production – into operational systems, into reports, into dashboards – and need to pick modeling tools that support this. Otherwise you are just supporting academic investigation which, unless you are in fact an academic research institute, isn’t going to move the ball forward.
The basis for this tension between operational issues and constraining analytic tool choice is often that analyst think that they are done when the model is “right”. Many analysts seem to believe that they can declare victory and pat themselves on the back when the model is accurate, statistically valid, highly predictive etc. They will often talk about all sorts of statistical measures that “prove” the model is a good one. Yet, in fact, the only results that matter are business results. If the model is accurate but impractical to implement then it adds no business value and should, therefore, be considered a bad model. The approach discussed in the article of letting analysts have freedom to pick their own tools and, to some extent, do their own thing, can easily result in this kind of situation. One company I worked with hired someone to create an analytic group who took the traditional approach. End result was a great model that was going to take 9 months of hardcore programming to get into the company’s business. Lots of costs, lots of delay, not a lot of analytic value. Similarly, things that improve a model’s accuracy but make it harder to implement can rebound – it takes longer to get into production and that delay represents lost accuracy (most models degrade over time) and lost business value. For instance, too often analytic modelers will bring in new data sources to improve the accuracy of a model without considering the impact on implementation complexity. In theory the model is more accurate but in practice it is less valuable.
I also think that organizations need to be much more focused on directing analysts towards business problems. There is a tendency to let analysts explore the data, see what can be discovered. This can result in real breakthroughs, and most folks in the data mining/predictive analytic business have some examples of this. But organizations should not rely on this approach. Instead they should “begin with the decision in mind”. Find the decisions that are going to make a difference to business results – to the metrics that drive the organization. Then ask the analysts to look into those decisions and see what they might be able to predict that would help make better decisions. Of course you have to know what makes a decision good or bad and how a decision impacts your metrics before you focus on it. And you need your analysts to understand what is likely to be implementable – do you need something your CRM system can execute or something that can be embedded in a report, for instance. Again, think industrial not artisan.
Some other quick thoughts in this vein:
- Sandboxes for your analysts to play in are good but they are not generally going to be in the Data Warehouse. Most Data Warehouses don’t contain the transaction level data that analysts need so they are going to need to work from extracts from production applications.
- Centralization of analysts into a single team is a consequence of success with analytics not a precursor to it
- In general, don’t roll your advanced analytic effort into your BI initiative as BI/DW people and analytic people tend to work quite differently. BI/DW folks think about summaries and rapid access to daily/weekly/monthly results for reporting while analytic people think about transactions, days since something happened and predicting the future. They don’t always play nicely together.
- Grouping analytic folks by business problem and looking for business domain know-how when hiring analytic folks is a good idea. But the type of model they are developing/have developed in the past is also a good organizing principle. Neil Raden and I divided analytics into those supporting risk-centric decisions and those supporting opportunity-centric decisions, for instance.
Now perhaps you are only now getting started and think it will be OK to hire your first analyst, or contract with your first data mining consultant, without thinking about these things. It won’t be. You don’t need to industrialize your first project (obviously) but you do need to start as you mean to go on so think through this and make sure your first few projects don’t send you off in an unhelpful direction. Remember, precedent is policy unless you make sure it isn’t.