It’s my pleasure to welcome Eric Siegel, President of Prediction Impact on data intelligence. He has kindly answered some of my questions related to Data Mining.
Q1. A brief intro about yourself and your DM experience
Eric: I’ve been in data mining for 16 years and commercially applying predictive analytics with Prediction Impact since 2003. As a professor at Columbia University, I taught the graduate course in predictive modeling (…
It’s my pleasure to welcome Eric Siegel, President of Prediction Impact on datalligence. He has kindly answered some of my questions related to Data Mining.
Q1. A brief intro about yourself and your DM experience
Eric: I’ve been in data mining for 16 years and commercially applying predictive analytics with Prediction Impact since 2003. As a professor at Columbia University, I taught the graduate course in predictive modeling (referred to as “machine learning” at universities), and have continued to lead training seminars in predictive analytics as part of my consulting career.
I’m also the program chair for PredictiveAnalytics World, coming to San Francisco Feb 18-19. This is the business-focused event for predictive analytics professionals, managers and commercial practitioners. This conference delivers case studies, expertise and resources in order to strengthen the business impact delivered by predictive analytics.
Q2. What are the most common mistakes you’ve encountered while working on DM projects?
Eric: The main mistake is not following best practice organizational processes, as set forth by standards such as by CRISP-DM (mentioned in your Dec 18th blog on “Methodologies”).
Predictive analytics’ success hinges on deciding as an organization which specific customer behavior to predict. The decision must be guided not only by what is analytically feasible with the data available, but by which predictions will provide a positive business impact. This can be an elusive thing to pin down, requiring truly informed buy-in by various parties, including those whose operational activities will be changed by integrating predictive scores output by a model. The interactive process model defined by CRISP-DM and other standards ensures that you “plan backwards,” starting from the end deployment goal, including the right personnel at key decision points throughout the project, and establishing realistic timelines and performance expectations
Dr. John Elder has a somewhat famous list of the top 10 common-but-deadly mistakes, which is an integral part of the workshop he’s conducting at Predictive Analytics World, “The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes”.As he likes to say, “Best Practices by seeing their flip side: Worst Practices”. For more information about the workshop, see The Best and the Worst of Predictive Analytics
Q3. Translating the Business Goal to a Data Mining Goal, and then defining the acceptable model performance/accuracy level for the success of the DM project appears to be one of the biggest challenges in a DM project. One approach is to use the typical accuracy level used in that particular domain. Another method is to model on a sample dataset (sort of a POC) to come up with an acceptable model performance/accuracy level for the entire dataset/project.Which approaches do you recommend/use to define the acceptable accuracy/cut-off level for a DM project?
Eric: Acceptable performance should be defined as the level wherey our company attains true business value. Establishing typical performance for a domain can be very tricky, since, even within one domain, each company is so unique – the context in which predictive models will be deployed is unique in the available data (which reflects unique customer lists and their responses or lack thereof to unique products) and in the operational systems and processes. Instead, forecast the ROI that will be attained in model deployment, based on both optimistic and conservative model performance levels. Then, if the conservative ROI looks healthy enough to move forward (or the optimistic ROI is exciting enough to take a risk), determine a minimal acceptable ROI and the corresponding model performance that would attain it as the target model performance level. This is then followed as the goal that must be attained in order to deploy the model, putting its predictive scores into play “in thefield”.
Q4. One thing I hear a lot from freshers entering the DM field is that they want to learn SAS. Considering the fact that SAS programming skills are highly respected and earn more than any other DM software skills, it’s actually a futile exercise to convince these freshers that a tool-neutral DM knowledge is what they should actually strive for. What’s your opinion on this?
Eric: Well, I think most people understand there are advantages to taking general driving lessons, rather than lessons that teach you only how to drive a Porsche. On the other hand, you can only sit in one car at a time, and when you learn how to drive your first car, most of what you learn applies in general, for other cars as well. All cars have steering wheels and accelerators; many predictive modeling tools share the same standard,non-proprietary core analytical methods developed at universities (decisiontrees, neural networks, etc.), and all of them help you prepare the data,evaluate model performance by viewing lift curves and such, and deploy the models.
Q5. According to you, what are the new areas/domains where DM is being applied?
Eric: I see human resource applications, including human capital retention, as an up-and-coming, and an interesting contrast to marketing applications: predict which employees will quit rather than the more standard prediction of which customer will defect.
I consider these the hottest areas (all represented by named case studies at PAW-09, by the way):
* Marketing and CRM (offline and online)
– Response modeling
– Customer retention with churn modeling
– Acquisition of high-value customers
– Direct marketing
– Database marketing
– Profiling and cloning
* Online marketing optimization
– Behavior-based advertising
– Email targeting
– Website content optimization
* Product recommendation systems (e.g., the Netflix Prize)
* Insurance pricing
* Credit scoring
Q6. In spite of the fact that a lot of companies in India provide Analytics or Data Mining as a service/solution to many companies around the world, there are no institutions/companies providing quality and industry focused Data Mining education. There are no colleges/universities offering Masters in Analytics/Data Mining in India. I have a lot of friends/colleagues who will gladly take up such courses/programs if they are made available in India. Can we expect this kind of courses/trainings from Prediction Impact, TheModeling Agency, TDWI, etc. in the near future?
Eric: I’m in on discussions several times a year about bringing a training seminar to other regions beyond North Americaand Europe, but it isn’t clear when this will happen. For now, Prediction Impact does offer an online training program, “Predictive Analytics Applied” available on-demand at any time.