At the data expo held at this year’s ASA meeting, the competition was to present as a poster an graphical analysis of 120 million airline flight records: all arrivals and departures of all but the smallest airlines in the USA from 1987 to 2008.
The winning poster from a team at the SAS Institute used SAS to produce a nice heat-meap of delays on a daily basis for the entire 22-year period. It reveals that winter is a bad time to fly (when the most cancellations occur) and that 9/11 changed everything (where a clear discontinuity appears as fewer flights led to fewer cancellations):
The runner-up entry from a team at Iowa State using R introduced other information into the analysis, including weather, crosswinds at landing, and fuel efficiency. They also managed to tease some fascinating facets out of the data, for example that three of the flights in the data set are actually registered as hot-air balloons, and (by tracking aircraft registrations and matching landings to subsequent takeoffs in a different city) that there have been more than 1 million “ghost flights” of empty planes since 1995. I thought this chart detailing the trends (and exceptions) in flights volumes at …
At the data expo held at this year’s ASA meeting, the competition was to present as a poster an graphical analysis of 120 million airline flight records: all arrivals and departures of all but the smallest airlines in the USA from 1987 to 2008.
The winning poster from a team at the SAS Institute used SAS to produce a nice heat-meap of delays on a daily basis for the entire 22-year period. It reveals that winter is a bad time to fly (when the most cancellations occur) and that 9/11 changed everything (where a clear discontinuity appears as fewer flights led to fewer cancellations):
The runner-up entry from a team at Iowa State using R introduced other information into the analysis, including weather, crosswinds at landing, and fuel efficiency. They also managed to tease some fascinating facets out of the data, for example that three of the flights in the data set are actually registered as hot-air balloons, and (by tracking aircraft registrations and matching landings to subsequent takeoffs in a different city) that there have been more than 1 million “ghost flights” of empty planes since 1995. I thought this chart detailing the trends (and exceptions) in flights volumes at four major airports was well done:
An interesting third entry from Yale University’s Michael Kane and Jay Emerson focused more on data-processing than graphical analysis. They used the bigmemory package to attach the entire data-set as a file-backed object, which can then be manipulated as an ordinary matrix in R. (Jay Emerson is the author of the bigmemory package and a frequent REvolution collaborator.) They then used the foreach package to process the data in parallel to investigate flight delays (flights leaving around 8PM have the greatest chance of a significant delay) and the effect of aircraft age on delays (a small but statistically significant effect, based on an analysis of 84 million records).
ASA Data Expo 2009: Airline on-time performance (with thanks to reader ER for the suggestion)