The figure above shows an example of a simulation of 2 innings as follows. The first column has the 9 players in sequence as the lineup is composed, with the #1 batter following the #9 every time after the 9th batter hits as usual in any baseball game. The second column (Jugada = play code) refers to the actual random variable for each play’s outcome. The third column (Runs) is the counter for how many runs are scored on each play. The next 3 columns (1BStart, 2BStart, and 3BStart) represent the bases occupied before the play, with a 0 code for empty and a 1 for occupied. The next 3 columns (1BEnd, 2BEnd, 3BEnd), represent the bases occupied after the play. The next to last column controls the number of outs, while the last column controls the inning.
So, in this example, the first player lead off the game with a Fly Out (play code = -1.5), so no bases were occupied, but the second player drew a walk (play code = 0), thus occupying first base after the play. Then, the third and fourth batters hit singles, thus loading the bases, while the fifth one, also hit a single, and that’s when the first run was scored (Runs=1). The sixth batter hit a homerun (play code = 4), scoring 4 more runs, and emptying the bases. The seventh batter was out on a Flyout (play code = -1.5) which was the second out of the inning. Batter #8 hit a double (play code = 2), and finally batter number 9 hit another fly out (play code = -1.5) to end the first inning.
The second inning starts with the leadoff hitter again this time striking out (play code = -2), the second batter was out on a flyout for out #2, and even though the third batter hit a single (play code = 1), he was left stranded because the cleanup hitter hit another flyout to end inning number 2. Notice that the bases are erased (all reset to 0) when a new inning starts.
This random process continues indefinitely, but runs are only accounted for until 9 innings are completed. This is another assumption the model makes, although it is known that sometimes the home team won’t need to bat in the 9th inning if the team is ahead, while other times extra-innings are needed if the game is tied, for the purpose of this model we always consider 9 full innings to simulate a whole baseball game. The model also assumes that the defense doesn’t make any errors, but also doesn’t make any double plays.
However, another important element of the baseball game and the lineup construction process deals with stolen bases. The model considers the possibility of a runner stealing a base. To stochastically model the stolen bases, we used 3 additional variables for each player:
1. Stolen Bases (SB)
2. Caught Stealing (CS)
3. Stolen Bases Opportunities (SBO).
Based upon these statistics, 2 parameters were estimated for each player, which are:
this refers to the propensity of stealing bases, in other words, how often does a player try to steal a base, and:
this refers to the efficiency at stealing bases, i.e. the proportion of steal attempts that are successful.
Once the parameters are estimated for each player, 2 different random variables are simulated using Bernoulli distributions.
1. Bernoulli (SBprop), for whether the player will attempt a steal
2. Bernoulli (SBeff), to determine whether the steal attempt was successful or not.
Hence, the model has 2 control cells to consider the stealing process, before the outcome of each play, as follows: if there is a runner on first base and there is no runner on second base, and if the Bernoulli variable for trying a steal is equal to 1, then a steal attempt is considered. Finally, if the Bernoulli for success is equal to 1, then the runner at first base is moved into second base before the next play (successful steal, SB), but if this variable is = 0 then the runner is erased and an additional out is added (caught stealing, CS).
A baseball game usually lasts for 9 innings, that is why the process continues indefinitely until the innings counter reaches 9. The minimum number of total plate appearances is 27 (3 outs x 9 innings), but there is no upper limit (maximum) number of plate appearances, rather, players will keep batting as many times as necessary to generate 27 outs.
The output variable (=RiskOutput) will be the sum of the runs scored for the whole 9 innings. A cell was created in the worksheet that sums all runs scored in the process up to the moment in which 9 innings are reached and this cell is defined as an output variable in @Risk so that its outcome is recorded after each iteration in the Monte Carlo simulation.
Application of the Model
The model needs at least 9 different players, each with his own parameters for the Discrete Distribution. A team with only 9 players can build a different lineup in 9! different ways, i.e. there are 362,880 different possible lineup combinations, but as the number of bench players increases the number of lineup combinations also increases significantly. For instance, if 2 bench players are added, then 11! = 39’916,800 different lineup combinations could be penciled, which is 110 times more combinations than with only 9 players.
As an example of the application of the model, the 11 players that were playing regularly in the recent 2025 ALCS by the Seattle Mariners lineup will be used, these were in order of position played: 1. Raleigh, 2. Naylor, 3. Polanco, 4. Crawford, 5. Suárez, 6. Arozarena, 7. Rodríguez, 8. Robles, 9. Canzone, 10. Garver, 11. Rivas. The following chart summarizes the probabilities for each player using the same color-coding for each play’s outcome:
