I am working on a paper, for publication in early 2014, on the role of standards such as R, Hadoop and PMML in the mainstreaming of predictive analytics. As I do so I will be publishing a few blog posts. I thought I would start with a quick introduction to the topic now and then finish the series in the new year.
I am working on a paper, for publication in early 2014, on the role of standards such as R, Hadoop and PMML in the mainstreaming of predictive analytics. As I do so I will be publishing a few blog posts. I thought I would start with a quick introduction to the topic now and then finish the series in the new year.
Just a few years ago it was common to develop a predictive analytic model using a single proprietary tool against a sample of structured data. This would then be applied in batch, storing scores for future use in a database or data warehouse. This model has been disrupted in recent years:
- There is a move to real-time scoring, calculating the value of predictive analytic models when they are needed.
- At the same time the variety of model execution platforms has expanded with in-database analytics as well as MapReduce-based execution becoming increasingly common.
- The open source analytic modeling language R has become extremely popular with up to 70% of analytic professionals using it at least occasionally (see the Rexer survey).
- Big Data is starting to have an impact, especially on advanced teams (as we saw in our Predictive Analytics in the Cloud work)
This increasingly complex and multi-vendor environment has increased the value of both published standards and open source standards. The paper is going to explore the growing role of standards in predictive analytics. It will discuss the role of R in expanding the analytic ecosystem, the way Hadoop helps organizations handle Big Data for in the context of predictive analytics, and the way PMML supports the move to real-time scoring. Plus of course we have the longer term impact of standards like the Decision Model and Notation standard.
I’ll write a blog post about each of these areas in the new year. I’d like to thank the Data Mining Group, Revolution Analytics and Zementis for their support of this research.