Among the years, hundreds of predictive techniques have been developed. They all have their advantages and drawbacks. There is no best choice that covers all data mining applications.
Among the years, hundreds of predictive techniques have been developed. They all have their advantages and drawbacks. There is no best choice that covers all data mining applications.
Usually, the choice of the predictive algorithm depends on one (or more) of the following factors:
- Accuracy in the current context (after trial and errors compared to other algorithms)
- Best practices from the literature (for example, Support Vector Machines are known to be efficient for image recognition)
- Data miner preferences and knowledge (a data miner may prefer neural networks because he used them more often or better understand them)
As it is the case for many data miners, I have a predictive algorithm toolbox that allows me to solve 80% of the problems. It is composed of:
- Pearson’s correlation, to better understand linear relationships among data
- Decision tree, a powerful method providing readable models
- Support Vector Machine (SVM), a kernel method quite robust to overfitting
What are your preferred techniques for predictive analytics? What is your personal toolbox?