I recently got the survey results from the annual data mining survey that Karl Rexer or Rexer Analytics runs. You can get the summary here or the full results from Karl but here are my thoughts:
- Data mining is everywhere. The most cited areas are CRM / Marketing and Financial Services with a big lead over Retail and Telecom. Healthcare did poorly, no surprise.
- Data miners most frequently work in are Marketing & Sales, Research & Development, Risk.
- Data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis – way ahead of the others. Text mining was back in the pack, interesting given the amount of text mining coming presentations we saw at Predictive Analytics World.
- Half of data miners say their results are helping to drive operational processes.
This is encouraging as I think this is by far the most effective way to use predictive analytics. - Batch scoring with the results getting stored in the database came top of deployment approaches at 30% with interactive real-time scoring at 21% and 16% putting the model into some overall softwareproject.
- 60% of respondents say the results of their modeling…
…
I recently got the survey results from the annual data mining survey that Karl Rexer or Rexer Analytics runs. You can get the summary here or the full results from Karl but here are my thoughts:
- Data mining is everywhere. The most cited areas are CRM / Marketing and Financial Services with a big lead over Retail and Telecom. Healthcare did poorly, no surprise.
- Data miners most frequently work in are Marketing & Sales, Research & Development, Risk.
- Data miners’ most commonly used algorithms are regression, decision trees, and cluster analysis – way ahead of the others. Text mining was back in the pack, interesting given the amount of text mining coming presentations we saw at Predictive Analytics World.
- Half of data miners say their results are helping to drive operational processes.
This is encouraging as I think this is by far the most effective way to use predictive analytics. - Batch scoring with the results getting stored in the database came top of deployment approaches at 30% with interactive real-time scoring at 21% and 16% putting the model into some overall softwareproject.
- 60% of respondents say the results of their modeling are deployed always or most of the time.
This is still not good enough – nearly half are not getting deployed. - The top challenges facing data miners are dirty data, explaining data mining to others, and difficult access to data. However, in 2009 fewer data miners listed data quality and data access as challenges than in the previous year. 34% also have problems with IT.
- Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
There’s lots more in the survey so go get it and read it.