Data Mining Book Review: Data Mining with R

Luis Torgo, interviewed on Data Mining Research, has recently published a book on data mining entitled “Data Mining with R, Learning with Case Studies”.

Luis Torgo, interviewed on Data Mining Research, has recently published a book on data mining entitled “Data Mining with R, Learning with Case Studies”. The book starts with an Introduction to R. Nicely written, it explains concepts that are needed to use this programming language for data mining. The book is then divided in four case studies. Each case study introduces data mining concepts that are illustrated using R.

First, pre-processing and data visualization are introduced through the prediction of algae blooms. Second, come the modelling and time ordering with the stock market application. Then, outlier detection and clustering are presented through fraud detection. Finally, feature selection and cross-validation are introduced through the classification of microarray samples. There is no introduction to data mining, but it’s not a problem since concepts are explained through the different case studies.

Theoretical concepts are always linked to examples. This is the case for most of the data mining books. Luis goes a step further by linking each application to the corresponding code in R. It is thus easy to both understand a concept as well as implementing it with R. This is certainly one of the best book for a direct implementation of data mining algorithms. Another good point of the book is that for most of the problems there are different ways to solve them.

I have one remark regarding the stock market prediction chapter. I have already discussed this issue when I was working in finance. The author states that the percentage of profitable trades should be above 50% to have a successful trading strategy. This is not always the case. Imagine a system where each winning trade brings $2 while loosing trades costs $1. Since you can earn more money with winning trades than what you loose with loosing trades, you can thus still have a successful trading strategy with 48% of winning trades, for example.

As a conclusion, this is an invaluable resource for data miners, R programmers as well as people involved in fields such as fraud detection and stock market prediction. If you’re serious about data mining and want to learn from experiences in the field, don’t hesitate!