Earlier this week, data scientists Pete Warden and Alasdair Allen reported that iPhones and cell-enabled iPads keep an internal log of the devices location, which is accessible from the backup that iTunes creates when you sync the device.
Earlier this week, data scientists Pete Warden and Alasdair Allen reported that iPhones and cell-enabled iPads keep an internal log of the devices location, which is accessible from the backup that iTunes creates when you sync the device. Naturally, there’s been some controversy over the privacy implications of the data keing kept, but from a data scientist’s perspective this represents a rich and interesting data source for analysis. Personally, I’m kind of interested to get access to where I’ve been over the past year: wherever I go, my iPhone goes.
Pete Warden has provided a tool that lets you easily access and map the data. Here’s why the data for my iPhone looks like when zoomed in on California:
If you have more than one iDevice the tool only shows the data from the one you most recently synced, and more importantly it doesn’t (directly) give you access to the location data if you want to do more with it. Drew Conway comes to the rescue with an R package (amusingly called stalkR) that lets you import the data from a named device into an R object for visualization or other analysis. You’ll need to make sure the dependent R packages are installed, and download and install the stalkR_0.01.tar.gz file in your R working directory:
install.packages("RSQLite")
install.packages("XML")
install.packages("ggplot2")
install.packages(c("maps","mapproj"))
install.packages("stalkR_0.01.tar.gz", repos=NULL, type="source")
library(stalkR)
iphone.locs<-get.mylocations("dsmith", "David Smiths iPhone")
viz.locations(iphone.locs, "usa")
The last two commands are the key ones. get.mylocations creates a data frame with timestamped latitute and longitude, along with a horizontal accuracy measure. There’s also an Altitude variable, but at least in my data it was always zero (probably because the tracking data comes from cell tower triangulation, not GPS). The vis.locations function uses the syntax of the map function to map the locations, but the nice thing about having the data in R is that you can use it in whatever way you like.
Drew Conway’s github: stalkR