Data Mining Methodologies
MS SQL SERVER DATA MINING
1. Defining the Problem: Analyze business requirements, define the scope of the problem, define the metrics by which the model will be evaluated, and define specific objectives for the data mining project.
2. Preparing Data: Remove/handle bad data, find correlations in the data, identify sources of data that are the most accurate, and determining which columns are the most appropriate for use in analysis.
3. Exploring the Data: Calculate the minimum and maximum values, calculate mean and standard deviations, and look at the distribution of the data.
4. Building Models: Specify the input columns, the attribute that you are predicting, and parameters that tell the algorithm how to process the data.
5. Exploring & Validating Models: Use the models to create predictions, which you can then use to make business decisions, create content queries to retrieve statistics, rules, or formulas from the model, embed data mining functionality directly into an application, update the models after review and analysis or update the models dynamically, as more data comes into the organization.
ORACLE DATA MINING
1. Problem Definition: Specify the project objectives and requirements from a business perspective, formulate it as a data mining problem and develop a preliminary implementation plan.
2. Data Gathering and Preparation: Take a closer look at the data, remove some of the data or add additional data, identify data quality problems, and scan for patterns in the data. Typical tasks include table, case, and attribute selection as well as data cleansing and transformation.
3. Model Building and Evaluation: Select and apply various modeling techniques and calibrate the parameters to optimal values. If the algorithm requires data transformations, step back to the previous phase to implement them.
4. Knowledge Deployment: Can involve scoring (the application of models to new data), the extraction of model details (for example the rules of a decision tree), or the integration of data mining models within applications, data warehouse infrastructure, or query and reporting tools.
SEMMA from SAS
1. Sample the data by creating one or more data tables. The sample should be large enough to contain the significant information, yet small enough to process.
2. Explore the data by searching for anticipated relationships, unanticipated trends, and anomalies in order to gain understanding and ideas.
3. Modify the data by creating, selecting, and transforming the variables to focus the model selection process.
4. Model the data by using the analytical tools to search for a combination of the data that reliably predicts a desired outcome.
5. Assess the data by evaluating the usefulness and reliability of the findings from the data mining process.
CRISP-DM (CRoss Industry Standard Process for Data Mining)
1. Business Understanding: Understand the project objectives and requirements from a business perspective, convert this knowledge into a data mining problem definition, and a preliminary plan designed to achieve the objectives.
2. Data Understanding: Collect initial data and proceed with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data, or to detect interesting subsets to form hypotheses for hidden information.
3. Data Preparation: Tasks include table, record, and attribute selection as well as transformation and cleaning of data for modeling tools.
4. Modeling: Select and apply various modeling techniques, calibrate their parameters to optimal values, step back to the data preparation phase if needed.
5. Evaluation: Evaluate the model, review the steps executed to construct the model, to be certain it properly achieves the business objectives. At the end of this phase, a decision on the use of the data mining results should be reached.
6. Deployment: Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process. In many cases it will be the customer, not the data analyst, who will carry out the deployment steps.
http://datalligence.blogspot.com/
Other Posts by Romakanta Irungbam
The Keyword Tree - Spotfire, Data Visualization and Text Mining - February 23, 2011
What's behind your Tree? - December 17, 2010
Means and Proportions with two populations - August 30, 2009
Analytics: Reality and the Growing Interest - May 31, 2009
A Tale Of Two Banks and One Telecom Service Provider - May 18, 2009
The moderated business community for business intelligence, predictive analyics, and data professionals.
--Sponsored--
From
By Steve Jones, Capgemini
Sea Change: Is your company prepared for the coming big-data wave?
By Paul Barsch and George Kong
Release the Flow: The Teradata Aster Analytic Pipeline Discovery sets the stage for uncovering new information.
By Mary Pat Simmons, Kevin J. Lewis and Dan Fritz
Smooth Road to System Upgrades: The Teradata Pre-Upgrade Assessment helps you avoid the bumps.
The Predictive Analytics in the Cloud Study is complete!
Register here to access the full results of this exclsuive study on Predictive Analytics and Cloud Technology including a whitepaper, 2 webinars, multiple podcasts and more!
Stephen Baker is the author of The Numerati & a journalist with 20 years of experience at BusinessWeek. More »
Paul Barsch directs professional services marketing programs for Teradata and has more than fifteen years of information... More »
Gary Cokins is an internationally recognized expert, speaker, and author. More »
Jill Dyché is an internationally recognized author, speaker, and business consultant. More »
Themos Kalafatis has worked as a consultant for Data Mining, Text Mining, Information Extraction and Data Quality for over a decade. More »
James Taylor is CEO and Principal Consultant at Decision Management Solutions and a leading expert in decision management. More »
- YOU
- Dean Abbott
- Teradata AusNZ
- Paul Barsch
- Meta S. Brown
- Jason Burke
- Ted Cuzzillo
- Barry Devlin
- Chris Dixon
- Jill Dyché
- Timo Elliott
- Teradata EMEA
- Teradata Experts
- Michael Fauscette
- Bob Gourley
- Julie Hunt
- Doug Lautzenheiser
- Jack Mason
- Darryl McDonald
- Alex Olesker
- David Smith
- James Taylor
- Daniel Tunkelang
Webinar: Making Sense of Service Organization Audits
When: Tue, 2012-02-14 02:00
Webinar Invite: Making Business Intelligence Faster & Easier
When: Tue, 2012-02-21 15:00
Banish Poor Application Performance: Eliminate Business Disruptions, Increase End User Productivity
When: Wed, 2012-02-22 11:00
O’Reilly Strata 2012
When: Tue, 2012-02-28 08:00
IFSUG Summit
When: Sun, 2012-03-04 08:00
Predictive Analytics World, March 4-10, 2012 San Francisco
When: Sun, 2012-03-04 09:00
Text Analytics World Topics & Case Studies – March 6-7, 2012 in San Francisco
When: Tue, 2012-03-06 09:00
Predictive Analytics World, April 25-26, 2012 in Toronto
When: Wed, 2012-04-25 09:00
Salford Analytics and Data Mining Conference
When: Thu, 2012-05-24 12:09
Big Data World Europe
When: Wed, 2012-09-19 08:30

About Social Media Today






