This is the final guest post from the research competition and was submitted by Eric Parziale, an undergraduate student and Dr. Philip Yates, a faculty member, both from the Department of Mathematics at the St. Michael's College in Colchester, Vermont.
Every soccer player has to answer the question, “When I get the ball, what do I do with it?” There are lots of options. Depending on the situation in which a player finds himself or herself, they may pass the ball to a teammate, kick the ball out of bounds, clear the ball upfield, dribble, hit the ball at an opponent, or a variety of other actions. Prepared and well-strategized teams should be interested in the decision-making behind ball movement in soccer.
The goal of this study was to measure the impact of ball possession and pass completion in the EPL during the 2010-2011 season based on play-by-play data from StatDNA. A logistic regression with a successful pass completion as the response and various in-game situations as predictor variables was used to identify which in-game factors are associated with successful pass completion (Table 1). These coefficients represent the change in log-odds of pass completion assuming all other predictors in the model are held constant.
We found that various in-game factors do impact whether or not a pass was completed. For type of pass, the impact on pass completion from best to worst is goalie throw, pass on the ground, throw in, head clearance, clearance, head pass, goalie punt, pass in the air, and a cross. For type of defensive pressure, the impact on pass completion from best to worst is closing, no pressure, marked, and challenge. For body position, it is back to the goal, sideways, and facing the goal. For field-location (in relation to the opponents defense), it is line 0, line 1, line 2, and line 3 (Higher lines indicate closeness to goal and higher pressure). For one-timers, it is one-timing the ball versus not one-timing the ball. Finally, the longer the pass in distance, the less likely it will be completed.

Table 1: Results of Logistic Regression of Pass Completion on Game Factors
Logistic regression models with varying slopes and coefficients were used to estimate the pass completion rate at the team level after accounting for in-game situations. A Bayesian approach was used with the intercepts for each team assumed to be normally distributed with a mean of zero and a common variance for the intercepts. The coefficients for the game factors for each team were assumed to be normally distributed with a mean of zero and a common variance for each coefficient associated with a certain game factor. Noninformative priors were assigned to the standard deviations for the intercepts and game factor coefficients.
Figure 1 illustrates the relationship between the median estimated probability of pass completion versus a team’s winning percentage (points divided by 114).

Figure 1: Plot of Median Estimated Probability of Pass Completion versus 2010-2011 Winning Percentage
There is a moderately strong positive association (r = 0.688) between a team’s pass completion and their winning percentage. Fitting a simple linear regression line to the data presented in Figure 2 yields the following result:
y-hat(i)=-0.6216+1.5244x(i)
where y-hat(i) is the predicted winning percentage and x(i) is the median estimated probability of pass completion for each of the 20 teams. The p-value for the slope coefficient is 0.0008, indicating a significant positive relationship between the median estimated probability of pass completion adjusted for in-game factors and winning percentage. If the outlier that is Stoke City is removed from the plot, the correlation jumps to r = 0.785. It can be noted that Stoke City plays an extremely physical game where their strategy centers around large players, long throw-ins and set pieces.
When pass completion is calculated after adjusting for the various in-game factors, there is a significant positive relationship between pass completion and team’s winning percentage in the EPL (p-value of 0.0008 from a simple linear regression). One can differentiate the top tier of the EPL and the rest of the league in terms of points in the standing by looking at pass completion adjusted by in-game factors.
The results presented here were able to capture not only passes a player intends to complete but also other passing alternatives, i.e., the clearances, punts, etc. As a result, it can be noted that teams which complete passes in difficult situations (versus clearing the ball to the other team or out-of-bounds) are more likely to win games: The decision making-abilities of players (as they relate to maintaining possession of the ball) is closely related to winning.