Existing members: Login

Blogroll

    Archives

    The Technical Area: StatDNA's Blog

    We analyze the world's most advanced soccer statistics to better understand the game.

    Keep the Ball - The Value of Possession in Soccer - Guest Post 6


    Posted by Eric Parziale and Philip Yates - Guest Post 12. October 2011 15:56
    This is the final guest post from the research competition and was submitted by Eric Parziale, an undergraduate student and Dr. Philip Yates, a faculty member, both from the Department of Mathematics at the St. Michael's College in Colchester, Vermont.   Every soccer player has to answer the question, “When I get the ball, what do I do with it?”  There are lots of options.  Depending on the situation in which a player finds himself or herself, they may pass the ball to a teammate, kick the ball out of bounds, clear the ball upfield, dribble, hit the ball at an opponent, or a variety of other actions.  Prepared and well-strategized teams should be interested in the decision-making behind ball movement in soccer.   The goal of this study was to measure the impact of ball possession and pass completion in the EPL during the 2010-2011 season based on play-by-play data from StatDNA.  A logistic regression with a successful pass completion as the response and various in-game situations as predictor variables was used to identify which in-game factors are associated with successful pass completion (Table 1).  These coefficients represent the change in log-odds of pass completion assuming all other predictors in the model are held constant.    We found that various in-game factors do impact whether or not a pass was completed. For type of pass, the impact on pass completion from best to worst is goalie throw, pass on the ground, throw in, head clearance, clearance, head pass, goalie punt, pass in the air, and a cross.  For type of defensive pressure, the impact on pass completion from best to worst is closing, no pressure, marked, and challenge.  For body position, it is back to the goal, sideways, and facing the goal.  For field-location (in relation to the opponents defense), it is line 0, line 1, line 2, and line 3 (Higher lines indicate closeness to goal and higher pressure).  For one-timers, it is one-timing the ball versus not one-timing the ball.   Finally, the longer the pass in distance, the less likely it will be completed.     Table 1: Results of Logistic Regression of Pass Completion on Game Factors    Logistic regression models with varying slopes and coefficients were used to estimate the pass completion rate at the team level after accounting for in-game situations.  A Bayesian approach was used with the intercepts for each team assumed to be normally distributed with a mean of zero and a common variance for the intercepts.  The coefficients for the game factors for each team were assumed to be normally distributed with a mean of zero and a common variance for each coefficient associated with a certain game factor.  Noninformative priors were assigned to the standard deviations for the intercepts and game factor coefficients.     Figure 1 illustrates the relationship between the median estimated probability of pass completion versus a team’s winning percentage (points divided by 114).   Figure 1:  Plot of Median Estimated Probability of Pass Completion versus 2010-2011 Winning Percentage   There is a moderately strong positive association (r = 0.688) between a team’s pass completion and their winning percentage.  Fitting a simple linear regression line to the data presented in Figure 2 yields the following result:   y-hat(i)=-0.6216+1.5244x(i)   where y-hat(i) is the predicted winning percentage and x(i) is the median estimated probability of pass completion for each of the 20 teams.  The p-value for the slope coefficient is 0.0008, indicating a significant positive relationship between the median estimated probability of pass completion adjusted for in-game factors and winning percentage.  If the outlier that is Stoke City is removed from the plot, the correlation jumps to r = 0.785.  It can be noted that Stoke City plays an extremely physical game where their strategy centers around large players, long throw-ins and set pieces.   When pass completion is calculated after adjusting for the various in-game factors, there is a significant positive relationship between pass completion and team’s winning percentage in the EPL (p-value of 0.0008 from a simple linear regression).  One can differentiate the top tier of the EPL and the rest of the league in terms of points in the standing by looking at pass completion adjusted by in-game factors.    The results presented here were able to capture not only passes a player intends to complete but also other passing alternatives, i.e., the clearances, punts, etc. As a result, it can be noted that teams which complete passes in difficult situations (versus clearing the ball to the other team or out-of-bounds) are more likely to win games: The decision making-abilities of players (as they relate to maintaining possession of the ball) is closely related to winning.

    Tags:

    passing | research competition

    An Optimal Passing Strategy for Soccer - Guest Post 5


    Posted by Andres G. Abad - Guest Post 28. September 2011 17:22
    The following guest post is from Andres G. Abad, Ph. D of the Escuela Superior Politecnica del Litoral (ESPOL), Ecuador.   Introduction   Soccer, just like any other interesting game, is a game of possibilities. More specifically, it is a game of decisions. Players are required to become efficient decision makers in a highly dynamic, rapidly changing, uncertain environment. Additionally, players are required to make these decisions in quite short periods of time. On top of that, other factors such as fatigue, game pressure, stress, and anxiety may burden even more a player's decision making capability. As a consequence, players are continuously making suboptimal decisions.      For this reason, a strategy for making optimal decisions during a soccer match is of interest. Such a strategy could provide a way of acquiring a necessary instinct for rapidly making satisfactory decisions. Additionally, this optimal strategy may be used to assess a team's compliance with the optimal strategy and, thus, providing a way of ranking teams. Furthermore, this strategy may be used to teach young players about optimal decision making and, as a consequence, aiding in rapidly developing a strong intuition when learning the football basics.   In this work, we provide an optimal passing strategy that improves the chances of scoring a goal. At every location of the field, we evaluate the possible courses of action a player may take (passing or shooting) according to their probabilities of producing a goal, i.e., we look for actions that maximize our chances of scoring a goal. We will obtain numerical values for our model by using a play-by-play dataset from the Campeonato Brasileiro de Clubes da Serie A 2010 provided by StatDNA.     The model   We propose a probabilistic model that combines the uncertainty of a pass completion with the probability of scoring a goal when shooting from each region, and use it to obtain an optimal passing/shooting strategy that maximizes our chances of scoring a goal.   We start by partitioning the soccer field in 30 regions, as shown in the figure below.   In order to construct our model we define the following probabilities: (1) Shooting Probabilities (SP) (2) Passing Probabilities (PP) (3) Absorption Probabilities (AP)     (1) Shooting Probabilities (SP)   We compute the probability that a shot on goal originated from each region ends up in a goal. In the figure below we show a 3D-histogram corresponding to the probabilities of scoring from each region obtained from the dataset.       The numerical values are shown in the table below.       (2) Passing Probabilities (PP) We now study the probabilities of completing a pass between every pair of regions. Based on the dataset, we observe that, for example, the passing probabilities for region 12 are given in the figure below.       The figure above shows that the highest passing probability corresponds to passing the ball to region 17 (marked with the larger blue arrow in the figure), with a PP of 0.847.      (3) Absorption Probabilities (AP)   The SP or the PP alone cannot determine the optimal passing/shooting strategy that we are looking for; the optimal strategy must integrate them together. To see this we just need to realize that easy passes usually do not help in scoring a goal. Conversely, extremely difficult passes may not be worth it. On the other hand, we may, for instance, be interested in passing the ball to region 3 because the chances of scoring from there (SP) are pretty high (0.703). However, the chances of completing a pass (the PP) to region 3 from any other region are, in general, quite low.   To integrate in our model the SP and the PP we propose to study the probability of eventually scoring a goal given that the ball is currently at each region. That is, the probability that a sequence of passes starting at a given region will end up in a goal. We will call these probabilities the (Goal) Absorption Probabilities (AP).    By using standard Markov Chain theory, we can compute the AP based on the PP and the SP. The AP for every region are shown in the figure below.         Optimal Passing Strategy   We are now ready to obtain an optimal passing strategy that, if followed, will maximize our chances of scoring a goal.    “This strategy chooses to pass the ball to the region with the highest absorption probability (AP), while at the same time also considering the probability that such a pass is successful (PP).”     The action of passing the ball from region i to region j is ranked by index R(i,j), obtained by   R(i,j)=PP(i,j)*AP(j),     where PP(i,j) is the probability of completing a pass from region i to region j, and AP(j) is the absorption probability at region j. The proposed optimal passing/shooting strategy is obtained simply by choosing the action(s) with the highest rank(s).     We now present the optimal passing/shooting strategy in the form of a table showing the five highest ranked courses of actions for each individual region.       When constructing an optimal passing/shooting strategy, we need to choose the most feasible sequence of courses of actions. For example, we may not pass the ball to a region where there are no teammates to receive it or to a region where there are too many opponents.     Conclusions   In this work we provide an optimal passing strategy that maximizes our chances of scoring a goal.    “A remarkable conclusion of this work is that crossing the ball is never an optimal pass because of its low probabilities of ending up in a goal and/or its low chances of being completed, and, thus, should be avoided.”     The figure below illustrates how, for example, if we have the ball on the left wings, it is optimal to pass the ball backwards, as oppose to crossing. The arrows in the figure show an optimal passing sequence derived from our proposed optimal strategy. The yellow boxes indicate the ranking of the corresponding alternative according to the optimal strategy table provided above.         Other examples of optimal strategies obtained from this work are provided below.      

    Tags:

    passing | research competition

    How we measure pass value creation: advancing the ball


    Posted by Ben Alamar 1. June 2011 17:22
      In previous posts I have looked at the effect of pressure on passing {link to previous post}. The analysis in that post included distance of the pass, but it did not qualify for the location of the pass. Additionally, that analysis was focused on the probability that the pass would be completed, instead of the value that the pass created for the team. This time, I’ve taken a sample of over 130,000 passes from the Brazilian Serie A and examined the effect that each pass has the odds that the team will score a goal, based on where on the field the pass originated and where it was received.  In order to make the analysis at all tractable, I split the pitch into 28 distinct zones. The zones are detailed in the diagram to the right. The orientation of the diagram is such that the lowered numbered zones are the defensive zones (a team’s defensive penalty box consists of zones 2, 3, 4 & 5).  Zones 19 and higher make up the attacking zones.     With the StatDNA play-by-play data, I was able to look at the probability that a team will score on a given possession, given the location, and a host of other variables. This estimated probability (or expected value/EV discussed in previous posts) changes with each play, so the change in the probability can be calculated with each pass. Incomplete passes and passes that reduce the probability of scoring create negative changes, while passes that increase the odds of scoring are positive changes. Separating the pass into their proper categories and averaging the change in scoring probability, we can see the average value of a pass between any two zones.   The table below provides the estimates the average change in expected value from a completed pass by originating zone to receiving zone for the Serie A data.  White zones represent a change in EV less than 0.3%, Yellow zones represent a change in EV from 0.3% to 2%, light green zones from 2-10% and dark green zones greater than 10%. Note also that some boxes are white simply because very few passes occured between those two zones. For example, a pass from zone 16 that finds its way to zone 25 (inside the attacking box) increases the probability that the team will score a goal by 11.2%, while a pass from zone 16 to zone 20 increases the probability that the team will score by 0.78%.   PASS VALUE CREATION FROM RECEIVING ORIGINATING LOCATION TO RECEIVING LOCATION       A couple of interesting things emerge.  Passes from zone 23 or 28 – which are generally crosses – have about half the increase in EV that passes from zones 20 and 21 into the box have (4% vs 8%).  When we include incomplete passes (and crosses) in the EV analysis – we see that passes in from the wing have an average EV improvement that is much lower than the average EV of passes from zones 20 and 21 into the box.  Since incomplete passes have negative value that counteracts the positive value from our completed passes, one would expect that passes from different bordering zones into the box would have similar average values when both incomplete and completed passes are included. This leads us to believe that Brazilian Serie A teams could be crossing the ball more than they should be and emphasizing play up the middle of the field less than they should be.     Another thing that is interesting is that passing from the defensive 1/3 into the midfield increases EV very little (in fact we didn't even include these passes in the table above, which begins in the midfield); passing from the midfield to the attacking 1/3 increases value a moderate amount and then of course passing from the attacking 1/3 and wings into the box increase EV substantially. We aggregate a player’s EV contribution to the team over all of his actions (including his complete and incomplete passes) and at first glance this may seem unfair to defenders who are seemingly receiving no value for their passing from this chart.     This, however, is not really the case.  A defender is receiving value for two facets of passing: firstly, since he can receive a large negative value for an incomplete pass in the defensive 1/3, high completion %’s will tend to aggregate small increments of value over time and avoid large negative EV contributions.  Second, and perhaps more importantly, this chart only takes into account increases in EV due to movement of the ball between field zones, when in fact the EV is multi-dimensional.  We will give the defender higher positive EV when he passes the ball to a more favorable location in terms of level of defense pressure on the recipient for example – so a defender who consistently passes to players who are more open will aggregate higher EV over time.     On the opposite end of the spectrum, an attacking midfielder will have a much higher degree of variance in his EVs – with one very high EV completed pass into the box, counteracting lots of negative EV incomplete passes.  In this case we value someone who can over time aggregate a high EV in spurts. Of course, if too many high value passes are attempted without success, this player’s value contribution will suffer.     This break down of passes also allows for clearer look at how teams can use their passing games to create more offense. Charts like the one below can be generated for different types of pressure as well as how deep into the defense the attacking player is at the time of the pass, to further flesh out the effect of passing. Additionally, these values can be summed up at the team and player level to examine which teams have the most effective passing games and where they create most of their value. At the player level, again the total value and where the value is created can be examined.    

    Tags: ,

    goal creation | passing

    Why players, teams are undifferentiated on "passing skill"


    Posted by Jaeson Rosenfeld 4. May 2011 17:26
    The statistic of pass completion % is one that I have discussed in the past as having limited relevance - see the post here for example. The key issue is that pass completion % does not tell you anything about who won the game or who scored the most goals, because its very situation specific (at certain times of the game a team may be ceding possession and allowing a team to complete a large number of non-threatening passes, for example).  One key factor that pass completion % does not take into account at all is pass difficulty.  Whether a team is banging the ball around in the defensive backfield or its making quick one-touch passes in traffic, pass completion % treats them all equally.  About the best you might be able to do at the player level is to compare players at similar positions on this metric, but as we know, no two players (even in the same position) play exactly the same role. We decided therefore to try to partially address the problem by running a regression to determine pass difficulty and then adjust passing skill based on the difficulty of passes attempted.  We ran a regression on over 100,000 passes from the Brazil Serie A, and also looked at several subsamples (passes in attacking 1/3, only passes on the ground in the attacking 1/3) and all roads lead to one conclusion: after adjusting for difficulty, pass completion % is nearly equal among all players and teams. Said another way, the skill in executing pass is almost equal across all players and teams, as pass difficulty and pass completion % is nearly completely correlated.  Before summarizing or concluding any further, we let's discuss a bit more the analysis that was done. We took completed pass as the dependent variable in a logistic regression that included the independent variables of level of defensive pressure on the passer, pass distance, direction passer is facing, whether the pass was one-timed, and if the pass was with the head or foot, and if the pass was hit on the ground or in the air.  We also used the field zone of the next touch (whether the pass was complete or incomplete) as a proxy for the level of pressure on the recipient of the pass, because we know that defensive pressure tends to increase as you move up the field and towards the goal. We needed to do this because we can't really measure pressure on the intended recipient on an incomplete pass.  All of the coefficients we tried were extremely significant and the regression had a very strong fit.  Here is a summary of the most important coefficients and impact on likelihood of completing a pass: Pass Distance: --Pressure: --Pressure on recipient (proxied by field zone): --Forward pass: -Air pass: -Head pass: -One-timed pass: - The fit on this model is incredibly strong (we use something called the Hosemer-Lemeshow test to judge the fit of the logistic regression and the model is significant at the .000 level). Using the model, we can then compute two things. Expected completion percentage of the pass can be thought of as pass difficulty (or more appropriately inverse pass difficulty, because the closer to 1 it is, the easier the pass).  We can also calculate a measure of passing skill, which is completed passes/expected completed passes, with 1 being neutral passing skill and figures above one being above average passing skill. What we find on this front is interesting. Firstly, if we take actual pass completion % and compare it to pass difficulty we have a correlation of 0.94 across the entire sample.  What this says is that pass difficulty basically completely determines a player's pass completion %.  Stated another way, if you look at completed passes/expected completed passes, its almost always near 1.  So viewed this way, differentiated passing skill is non-existent at this level of play, at least in terms of executing a pass.  The characteristics of the pass in terms of pressure, distance, etc, will in the long-run determine the completion %.  When starting this analysis, I felt pretty confident that we would find that when you adjusted pass completion % for difficulty, you would find interesting things - for example that "passing skill" in the attacking 1/3 would correlate strongly to goals or assists.  When I first saw that passing skill was non-differentiating when adjusted for difficulty - I was a bit surprised, but when I thought a bit more about it, it didn't seem that off-base. Here is why: The problem with general pass completion % is that it does not take into account difficulty. And now we have a new problem with the "passing skill statistic"; while it does adjust for pass difficulty it does not adjust for  a couple of other factors: (1) actions prior to strking the pass and (2) danger created by the pass. Firstly, how much does a passer increase his pass completion % by the actions he takes prior to taking the pass? For example is Xavi an "excellent passer" because he can place a pass on a dime or is it more his ability to find pockets of space where no defensive pressure exist to receive the ball in and his miraculous ball control allows him to continue to avoid pressure and hit higher value passes for an equal level of difficulty?  Many players put themselves in difficult passing situations because they dwell on the ball too long and upon receiving the ball are not able to reposition their bodies in a way that opens up the field.  In order to look at this we need to understand better the situation a player receives the ball in, and whether he reduces or improves his relative abilty to complete a pass (E.g. pass difficulty) with his actions between then and the time of the pass. Since we do keep detailed statistics on situations upon a player receiving the ball, this is something we plan to analyze. Another very important factor is potential danger created by a pass.  We have a statistic for this which is called Pval (pass value) and measure the % increase that a team's chance of scoring a goal increases with each pass.  We believe pval is where players' values in passing should be measured, because that's where they do create value differentially.  A player creating more pval is increasing value by creating situations that have higher yield for equal pass difficulty - this could be by getting himself open by eluding defenders (with the dribble or off ball movement) and also by using his vision to select the highest value pass for a fixed difficulty. We also cannot dismiss the overall contribution of the team to each players pval - with consecutive passes and ball movements that help continuously create more pval (by creating space, penetration into the defensive and upfield progress) being related to many players and just not the one or two responsible for the last two touches.  We need to find a way to properly distribute credit, and it's no simple task. We will be having a first look at pval in a blog post later this week. The next time you see a statistic on straight pass completion % - you'll have a new way to understand it. This number simply roughly reflects the average difficulty of the pass that team attempted (though in a small sample - this becomes a little less binding).  The difficulty in turn may be a reflection of a whole host of things having to with the tactics each team were pursuing in the game (e.g. who and where to pressure, how direct to attack, etc).  Given this, its very hard to judge pass completion % on face value as any indicator of team or player performance.

    Tags:

    passing