After reading the article from Tom Khabaza, I want to discuss some aspects of it with you.
After reading the article from Tom Khabaza, I want to discuss some aspects of it with you. The article is in general nicely written and shows the experience of the author, however I do have comments for some of the laws. In the first law, it is stated that that there is no data mining without business objective. While it is true most of the time, this is not always the case. In R&D, a data mining project can be started without clear business goal.
Since data mining may discover unexpected knowledge, there may be no defined objective at the beginning of the project. Later in the project, one can define the objective if specific trends has been found in the data for example. Clearly, there are two approaches for data mining in the company: top-down and bottom-up. The top-down approach is driven by business needs. The bottom-up approach is driven by the data. Both approaches can be complementary. When you are driven by the data, the business objective may come later. If you discover that there is no usable trend in the data, maybe there is no place for a project and thus no business objective. But there is still data mining.
In the second law, Khabaza states an excellent point about the importance to understand the business:
“[…] whatever is found in the data has significance only when interpreted using business knowledge, and anything missing from the data must be provided through business knowledge.”
In the fourth law, Khabaza explains that problem formulation and resolution are both tasks for the data miner:
“However, these views arise from the erroneous idea that, in data mining, the data miner formulates the problem and the algorithm finds the solution. In fact, the data miner both formulates the problem and finds the solution – the algorithm is merely a tool which the data miner uses to assist with certain steps in this process”
It means that the complete knowledge discovery process can’t be automated. The data miner has to formulate the problem, solve it and interpret the results. However, parts of the data mining process can still be automated (ETL, building the model, scoring, etc.)
Read the full article from Tom Khabaza.