“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community.
“The R-Files” is an occasional series from Revolution Analytics, where we profile prominent members of the R Community.
Martyn Plummer is a longtime contributor to the R community and a member of the R core group, which consists of 20 members that help oversee the continued evolution of the project. Plummer also serves on the editorial board of the R Journal, the official journal of the R project. By day, he serves as a Statistician and Epidemiologist at the International Agency for Research on Cancer (IARC), based in Lyon, France.
Plummer, who has been using R since 1995, has developed or contributed to a number of popular packages, including coda for analyzing Markov Chain Monte Carlo output, JAGS, a clone of the popular WinBUGS software Bayesian analysis and Epi, which provides functions for epidemiologists and accompanies an annual course that aims to introduce epidemiologists to R.
He has also incorporated R into his work at IARC, where he works in the Infection and Cancer Epidemiology group. Much of the work of this group is focused on human papillomavirus (HPV), which causes half a million cases of cervical cancer per year worldwide. Plummer and his colleagues use R (including his own Epi package) to analyze epidemiological studies of HPV infection and try to tease out some aspects of HPV natural history that are difficult to understand without statistical modeling, such as whether different HPV types interact with each other. He also relies heavily on R’s graphical capabilities for visualizing data in scientific publications.
Prior to R, Plummer worked primarily with S+ for analyzing data. He had been working in Cambridge, United Kingdom in the Biostatistics Unit at the Medical Research Council when he was offered a position at the IARC in Lyon. He recalls the transition, and how his new position introduced an entirely different computing environment. Soon after moving, he was introduced to the recently-formed R project by his colleague David Clayton.
“From the beginning, I saw enormous potential in R,” says Plummer. “While I was accustomed to S+, it wasn’t long before I completely switched over to R. It was and continues to be unparalleled in its flexibility in terms of data analysis.”
Plummer also points to R’s extensible nature as one of its defining features. As a modern language, R is able to effectively adapt to the changing nature of data analysis in an era of increasingly large, unstructured data sets. “One of the most important features of R is that it’s built around the data; it’s designed for programming with data, so it can take these developments in stride,” he says.
He went on to describe a recent article in the R journal that analyzed 18 months’ worth of text from the R mailing lists and identified relationships between prominent members of the R community based on the topics they discussed. Plummer cites it as an example of R’s ability to keep up with the ever-changing notion of “data.”
“10 years ago, I would have never called such an amalgamation of text a ‘data set,’” he says. “Today, though, we find ourselves in a situation where we can elicit structure from large and complex data sets and glean meaning from it.”
When asked about how he sees the R project evolving in coming years, Plummer speaks of a delicate yet effective balance. “R manages a difficult equilibrium; it’s partly a frontier for innovation in statistical computing, yet it’s also a stable platform for data analysis. It’s unique in this regard and I don’t see it facing serious competition for quite some time.”
He sees the current situation being maintained at least over the next few years, though one challenge for R users is to navigate the increasing number of contributed packages. While there’s incredible innovation being done for a diverse range of functions, Plummer says, there are also opportunities for the community as a whole to pool and share their work.
“One of the most important and oft-overlooked values of the R community is its interdisciplinary nature,” he says. “It’s remarkable to be able to collaborate with so many talented people from a diverse range of fields. We’re all statisticians, but statistics has a terrible tendency to fragment by subject matter. R gives us all a common platform and brings us together to encourage innovation.”