My friend and colleague James Taylor asked me last week to comment on a question regarding statistics vs. predictive analytics. The bulk of my reply is on James’ blog; my fully reply is here, re-worked from my initial response to clarify some points further.
My friend and colleague James Taylor asked me last week to comment on a question regarding statistics vs. predictive analytics. The bulk of my reply is on James’ blog; my fully reply is here, re-worked from my initial response to clarify some points further.
I have always love reading the green “Sage” books, such as Understanding Regression Assumptions (Quantitative Applications in the Social Sciences)
or Missing Data (Quantitative Applications in the Social Sciences)
In data mining and predictive analytics, the data is king. These models often impute the models from the data (decision trees do this), or even if they only fit coefficients (like neural networks), it’s the accuracy that matters rather than the coefficients. Often, in the data mining world, we won’t have to explain precisely why individuals behave as they do so long as we can explain generally how they will behave. Model interpretation is often related to describing trends (sensitivity or importance of variables).
I have always found David Hand’s summaries of the two disciplines very useful, such as this one here; I found that he had a healthy respect for both disciplines.
This post was first posted on Predictive Models are not Statistical Models — JT on EDM