For those not familiar with the major-league baseball in the US (and despite living here for more than 10 years, I still include myself in that category), the games usually played in series: team A visits the home of team B, and the two teams play two or more games against each other on successive days. It’s common wisdom that if team A wins on the first day, they’re more likely to be the victors on the second day, too. (Folks “knowledgeable in baseball and the mathematics of forecasting” give the probability of a second-day win given winning the first game at 65% to 70%.) But is that assertion borne out by the data?
Decision Science News has analyzed data from all major league baseball games played between 1970 and 2009, and used R to analyze the likelihood of winning the second game in a consecutive pair of games, given the result of the first. (All of the R code and data are provided, if you want to try and replicate the analysis yourself.) You can see various analysis in DSN’s post, but the most revealing one for me was this chart on the frequency of successive wins with respect to the margin of victory (excess runs) in the first game:
For those not familiar with the major-league baseball in the US (and despite living here for more than 10 years, I still include myself in that category), the games usually played in series: team A visits the home of team B, and the two teams play two or more games against each other on successive days. It’s common wisdom that if team A wins on the first day, they’re more likely to be the victors on the second day, too. (Folks “knowledgeable in baseball and the mathematics of forecasting” give the probability of a second-day win given winning the first game at 65% to 70%.) But is that assertion borne out by the data?
Decision Science News has analyzed data from all major league baseball games played between 1970 and 2009, and used R to analyze the likelihood of winning the second game in a consecutive pair of games, given the result of the first. (All of the R code and data are provided, if you want to try and replicate the analysis yourself.) You can see various analysis in DSN’s post, but the most revealing one for me was this chart on the frequency of successive wins with respect to the margin of victory (excess runs) in the first game:
So moderate and even significant wins by several runs don’t in the first game don’t appear to confer much benefit in the second. As DSN points out:
The equation of the robust regression line is: Probability(Win_Second_Game) = .498 + .004*First_Game_Margin which suggests that even if you win the first game by an obscene 20 points, your chance of winning the second game is only 57.8%
In fact, any second-game advantage from the first win that may exist is far overshadowed by the home-team advantage: “When it comes to winning the second game, it’s better to be the home team who just lost than the visitor who just won.” See the full article at Decision Science News for the details of the analysis.
Decision Science News: You won, but how much was luck and how much was skill?