How do baseball win predictions work?
For this blog, I will be discussing how 538’s baseball prediction works.
Team Ratings
First, they start with creating Team Rankings. To do this they collected game results and box scores going all the way back to 1871. They used data to create an Elo-based rating system and predictive model for baseball that accounts for home-field advantage, the margin of victory, park, and era effects, travel, rest, and — most importantly — starting pitchers.
Every MLB team carries a rating that estimates its current skill level. The league average is about 1500.
After every game a team plays, their rankings are adjusted based on the results of the game. The winning team gains rating points while the losing team loses the same number of points, based on the chances our model gave each team to win the game beforehand and the margin of victory.
For example, a big underdog winning would gain more points than a win from a strong favorite. Also the bigger the victory the more points.
Preseason Ratings
Before the start of each season, each team has to get a starting rating
- 67 percent comes from the team’s preseason win projection according to three computer projection systems: Baseballs Prospectus’s PECOTA, FanGraphs’ depth charts, and Clay Davenport’s predictions. These are all then scaled into an Elo range
- 33 percent comes from the team’s final rating at the end of the previous season, reverted to the mean by one-third.
Pregame Ratings
The team ratings are also adjusted before every game based on four factors. The four factors are home-field advantage, how far it has traveled to the game, how many days of rest it’s had, and which pitcher is slated to start.
- Home-field advantage is worth 24 rating points. For games played without fans in attendance, home-field advantage is worth 9.6 rating points.
- The penalty for travel is worth up to about 4 points and is calculated with miles_traveled**(1.0/3.0) * -0.31
- Each day of rest (up to a maximum of three) is worth 2.3 points.
- A pitchers game points is calculated through the formula 47.4 + strikeouts + {(outs*1.5)} — {(walks*2)} — {(hits*2)} — {(runs*3)} — {(homeruns*4)
All-Time Ratings
538 also went all the way back to 1871 and used their rating system for every team
Predictions
To make their predictions they use the team ratings and run Monte-Carlo simulations. Monte-Carlo simulations are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.
538 runs these simulations “hot”. This means that a team’s rating doesn’t stay static because the ratings change within each simulated season based on the results of every simulated game and the other factors I discussed earlier.
People can then use these predictions for betting or just for fun to see who has the best probability to make the playoffs and who is gonna win the World Series.