Today on Data Mining Research I receive Luis Torgo. He is the author of “Data Mining with R: learning with case studies” (to be reviewed soon on DMR). Luis has kindly accepted to answer a few questions for the readers of Data Mining Research. Thanks for your time Luis.
Today on Data Mining Research I receive Luis Torgo. He is the author of “Data Mining with R: learning with case studies” (to be reviewed soon on DMR). Luis has kindly accepted to answer a few questions for the readers of Data Mining Research. Thanks for your time Luis.
Data Mining Research (DMR): Could you please introduce yourself to the readers of Data Mining Research?
Luis Torgo (LT): I’m an assistant professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto, Portugal. I’m also a member of the Laboratory of Artificial Intelligence and Decision Support, which is one of the research units of a large research lab in Portugal named INESC Porto, LA. I started my career in 1990 and since them I’ve been doing research in machine learning and/or data mining. My involvement in data mining includes a mixture of research and applications, since I’ve been involved and coordinated several projects with the industry apart from my academic contributions to this research area. I’ve been following the R project almost since its beginnings and currently most of my work is carried out using R.
DMR: Your book “Data Mining With R” is available since end of 2010. Can you tell us more about this book. Who is the audience? Do you have some feedback on the book?
LT: The book is a natural consequence of my experience with data mining and R both within applied projects and in teaching. Having given several courses on these topics throughout the years and for a wide range of audiences I came to the conclusion that one of the best forms of reaching people is by motivating them to some challenging topics through concrete applied case studies. This is the main punch line of the book: learn with case studies. There are many existing and outstanding books on data mining topics. Why writing a new one? Instead I’ve decided to take a different approach – present some of the most relevant data mining topics as needed when addressing a set of real world data mining case studies. The reader is given all data and code, not only by the accompanying R package (DMwR) of the book, but also in the book web page where we can download and/or copy-paste all code inside the book into R. In this context, I think the audience of the book is rather wide, in the sense that it can interest both students of data mining courses that wish to practice their knowledge, but also to people in the industry as an easy and case-oriented way of entering the “world” of R and data mining. In terms of feedback after 5 months I’m very happy with how things are running. I’ve been receiving a huge number of emails of happy readers and I also know that the book has already been adopted in a few schools. I’m particularly happy with the amount of emails I got from people in the industry congratulating me on the book. Finally, the book is #1 at the sales rank of data mining books at amazon.com since it is available, which I guess is a good feedback!
DMR: For your book, and I guess most of your work, you have chosen R. Why?
LT: It is true that nowadays I mostly use R whenever I need to program anything. I think the R project is a very exciting community where lots of new ideas keep appearing. Although I use a lot R I do not stop being surprised by discovering new things that are available in R through some of the thousands of user-contributed packages. I think the main advantage of R is exactly the ability to rapidly turning ideas into prototypes because of being able to build upon already existing code. I used to develop most of my data mining software in C but although I ended up developing a kind of machine learning / data mining library with a colleague to try to have a basic set of useful functions to start from, the fact is that it would always be too far from what is available in a large project like R. In summary, rapid prototyping of ideas and wideness of available code/methods/algorithms for you to use, are the main reasons I use R in my work.
DMR: If you had to give only one advice to a beginner in data mining, what would it be?
LT: Keep an open mind in terms of tools and methods and be willing to explore alternatives – after all mining is about exploration!
More information on Luis Torgo and his book:
Book web page: http://www.liaad.up.pt/~ltorgo/DataMiningWithR
Web page: http://www.liaad.up.pt/~ltorgo