We thank Valeria Espinosa, Jonathan Hennessy, Bo Jiang and Joseph Kelly - four graduate students in Havard's Department of Statistics - for this guest post. It's a summary of their research paper submission "Transitioning from Winning to Losing."
Using the StatDNA play-by-play data, we attempted to understand how a team’s style of play varies with the outcome of the game (win, loss, tie). What do we mean by style? Well, admittedly, style (in soccer and in life) is difficult to quantify. A team’s style can be tied to how aggressively the players pursue the ball, how frequently the ball moves from side to side, or how much hair gel the players use. The aspect of style that we decided to focus on here is how the offense moves the ball from region to region and in which regions certain events (gain possession, lose possession, shots, fouls, etc) tend to occur. We next investigated whether a team’s style is consistent across games. Specifically, was the style the same across wins, losses and ties? Because opponents generally seek to take away a team’s strength (e.g. force a team that likes to play down the middle to play down the wings), we expect a team’s style to vary across games and it is both interesting and of potential strategic value to understand which aspects of style are most associated with different outcomes
We first divided the field into 12 equally sized regions. Regions 1, 2, and 3 correspond to the area of the field the team is defending. Regions 10, 11, 12 correspond to the area of the field the team is attacking.
Across all games, we counted the number of times each team passed the ball from one region to another. In addition, we counted the occurrence of certain game events (gain possession, lose possession, successful shot, shot, free-kick, foul, offsides) in each region. For every team, we broke down the counts by game outcome (wins, losses and ties).
For example, let’s take a look at Chelsea.
In figures (a) and (b) each heatmap corresponds to a region (r1) of the field. That region is subdivided into its own mini field. In this mini field, every square (r2) corresponds to the proportion (note that we are displaying the proportion and not the counts) of passes from r1 to r2 out of all the passes originating from region r1. In figures (c) and (d) a mini field is displayed for each event (E). The color in each region (r) represents the proportion of times that event E occurred in region r. Figures (a) and (c) use only the games that Chelsea won and figures (b) and (d) use only the games that Chelsea lost.
We next want to understand whether these proportions are different for wins and losses. For each (r1, r2) and (r, E) combination we conduct two Fisher (one sided) exact tests to assess whether the proportions are positively associated with either wining or losing. Because we are conducting so many tests, there is definitely a lurking multiple comparisons issue. While we don’t formally adjust our testing procedure, it is worth keeping this in mind. P-values can be thought of as measures of how positively associated each (r1, r2) and (r, E) combination is with either winning or losing (a lower p-value means a greater association).
For instance, figure (a) shows the 16 * 9 = 144 p-values testing whether each (r1, r2) combination is positively associating with winning (p-values in figure (b) test whether combinations are positively associated with losing). Most of the 144 combinations do not seem positively associated with winning (p-values >= 0.1). Interestingly, combinations (r1, r2) = (4, 8), (5, 8) and (6,8) are significant, indicating that in wins, Chelsea tended to move the ball from defensive regions directly into central midfield. More work would need to be done to verify a result like this, but it is a promising lead.
Given more data, we believe that a graphical and statistical analysis like this one can reveal aspects of a team’s style that might otherwise be missed. Knowing which regions a team prefers to move through could provide a tactical edge.