As someone who trained as a statistician, I’ve always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let’s face it: Statistics — the name — has always had a bit of a branding problem. Telling someone I was a statistician was more likely to conjure up images of me counting runs at a baseball (or cricket) game than pursuing serious science.
As someone who trained as a statistician, I’ve always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let’s face it: Statistics — the name — has always had a bit of a branding problem. Telling someone I was a statistician was more likely to conjure up images of me counting runs at a baseball (or cricket) game than pursuing serious science. And the image of what Statistics ideally is about — collaborative, interactive, applied, fun — was too often subsumed by the stereotype image — isolated, actuarial, ivory tower, report driven. (And hey, even actuaries can be fun sometimes.)
That’s why I’m a fan of the term “data scientist” — it embodies everything that Statistics always should be, without the baggage and tradition of the term “statistician”. So I enjoyed participating in yesterday’s Kalido webinar “Data Scientist: Your Must-Have Business Investment Now” where I could make the following contrast between the images of Statisticians and Data Scientists:
(A quick aside on the “Data Size” row above: while the unstructured or unaggregated data source data that data scientist work with can be in the terabytes range or even large, by the time it’s cleaned and prepared for statistical modeling, a file in the gigabytes range is even more typical — even at “Big Data” companies like Facebook. This is a topic I cover in more detail in my recent Strata talk on real-time predictive analytics.)
So bottom line: while I am a statistician, and I love Statistics dearly, I do prefer to call myself a Data Scientist today, because it better represents to me what Statistics really is to me (if that makes sense). And that’s certainly not to diminish the achievements of those who do call themselves Statistician. In particular, I want to recognize George Box: a true hero of mine, coiner of the idiom “all models are wrong, but some are useful”, and one of the nicest people I ever met, who sadly passed away in March.
On the other hand, I have no qualms about making a competitive comparison between Data Science and Business Intelligence:
You can get the details of how I differentiate Statistics and Data Science and BI, and hear other perspectives on Data Science from fellow data scientists Carla Gentry and Gregory Piatetsky in the slide sand replay of the webinar provided by Kalido at the link below.
Kalido: Data Scientist: Your Must-Have Business Investment NOW