Existing members: Login

Widget Administration not found.

The file '/widgets/Administration/widget.ascx' does not exist.X

Blogroll

    Archives

    The Technical Area: StatDNA's Blog

    We analyze the world's most advanced soccer statistics to better understand the game.

    Creating Wins - Research competition Guest Post 1


    Posted by Brian Mills - Guest Post 7. September 2011 17:57

    The following guest post was written by Brian Mills, a Doctoral Candidate in Sports Management and Graduate Student in Statistics at the University of Michigan.

     

    Anyone that has watched or played soccer can understand and appreciate the importance of scoring a goal. Scoring is a relatively rare event when compared to other sports like lacrosse, football or even baseball. Therefore, understanding how each event in a game increases the probability of scoring a goal—or the inverse: keeping a goal from being scored by the opponent—should prove useful for anyone trying to understand how to increase the winning expectancy of their respective team.

     

    Recently, Ben Alamar explained how StatDNA evaluates passing by players going from one field zone to another. Successfully passing the ball from outside the box to just in front of the goal, of course, increases the probability of a goal by much more than passing across the midfield line. Therefore, getting the ball nearer the opponent’s net is an important part of understanding goal creation—as in the aforementioned blog post—and what the player does with the ball once it is there is also key to getting it in the net. As you can see below, getting the ball near the net is relatively rare when compared to the occurrence of events near midfield (dark red). So not only are goals important, but so are the previous events that allowed that goal to take place.  

     

    What I will share today attempts to take this a little further—albeit in a very preliminary model—which evaluates the change in probability of a win for each event occurring in the game, given where that event took place on the field. 

     

     

     

     

     

    Using this idea, we can do a few interesting things.  First, we can create Win Probability Graphs. Those are always fun to look at, but there is an advantage in player evaluation. For each event, the probability that one’s team will win either goes up or down. A completed pass is a positive for the team completing it. A save is positive for the goalie’s team. From here, we can aggregate across each play the player is involved in and get a total—or per-event average—Win Probability Added (“WPA”) measure.

     

    Now, the idea of Win Probability Charts and WPA is not new. Fangraphs does them for baseball and I have seen some of the former for soccer online. However, those for soccer have generally used only goals and the time dependence of being ahead (behind) in the game.  But the goal event itself isn’t the only thing to account for. Here, I not only expand on the probability of a Win, Loss or Draw from goals, but also other events and the position on the field in which they take place.

     

    On to the model. For evaluating win probability, I have been working with a vector generalized additive proportional odds model. This allows the ordering of the three possible outcomes for the game and calculate the probability of each at a given time point: 1) Home Team Win, 2) Draw and 3) Away Team Win. Those familiar with smoothing techniques will note that with a GAM we can not only calculate the probability change for each event (thanks to the fantastic touch-by-touch EPL data from StatDNA) but also the change given the spatial proximity to midfield and the out-of-bounds lines. Using a smoother allows us to control for the two-dimensional (and non-monotonic) changes in probability given the event location and adjust accordingly. After all, a shot on goal from the midfield line likely does not have the same win probability influence as one taken inside the box.

     

    While I won’t go into details with the modeling itself, this model requires that the probability of each of the 3 possible events will always add up to 100% at any given time point. If one team’s probability of a win increases, then the probability of the other team winning (and/or a draw) must decrease. Below, I have a version of the Win Probability Charts for Arsenal vs. Everton and Chelsea vs. Sunderland. These are created by predicting the ordered Win-Draw-Loss probability for each event directly from the model, given the event taking place and the previous game state. There are a few logical things to notice here to help validate the model:

     

    1) The closer to the end of the game, the more that a lead-changing goal affects the probability of a win.

     

    2) The home team (RED) begins with a higher expected win probability than the away team (BLUE).

     

    3) A goal is, of course, the most valuable event in the game (more on this later).

     

    4) About 30% of games end in Draws, so the starting point of the Draw (Yellow) line makes sense. 

     

     

     

     


    I must note some issues with this preliminary version. First, I do not use any prior knowledge of the team’s ability. Arsenal is likely at more of an advantage playing against Everton than the graph may imply using the sample average, given their better record in each of the past few seasons.  In general, I’m not totally satisfied with the starting point of the home and away win probabilities shown on the chart. Secondly, sometimes the model does not drift far enough toward 100% when a team is ahead and nearing the end of the game. Take Arsenal vs. Everton, for example. With near 0 seconds left in the game, the probability of an Arsenal win should be at essentially 100%. This is likely due to the somewhat small sample size of games (less than 150 in this sample) and a possible late goal in one of them being over-weighted. These both could be remedied with a larger data set or some Bayesian priors using past games based only on score advantage, team record and time remaining.  

     

    For a more comprehensive model, other important variables would include Pass Distance and Player Positioning when receiving that pass.  These require further specification, as only Pass Events have a Pass Distance recorded. Finally, goals early on in a blowout are worth more than those later, so if certain players are scoring goals in different situations this could affect the outcome of the WPA measure for players. Players are rewarded extra for scoring or stopping a goal in times where a goal would cause a large swing in win probability (“High Leverage Situations”), so it is important to keep this caveat in mind unless we expect the leverage to “even out”. Since teams try and get their top players the ball for these situations, there is likely some bias.

     

    Assuming all is well and good with the model and data used to construct it, we can easily use these models to estimate each player’s contribution to a win throughout the game or the season as well as get an average impact of each event. To do this, I simply take the first difference in Home win, Draw and Away win probability from the current event and previous event. This gives the change in win probability for the given team at each event. Depending on how one thinks Draws should be weighted, we can adjust as necessary.  From here, it is easy to total or take an average per event for each player given which team he is on (Home or Away).  

     

    The results from my preliminary model indicate that Goalies have the largest total impact, a logical result given that each time they touch the ball it is in close proximity to the goal. Defensemen are next on the list, followed by a mix of Midfielders and Strikers. However, this does not necessarily mean that the goalies are more valuable than anyone else! One must be careful to compare Goalies only to other Goalies and Strikers to only his true positional counterparts.

     

    On the first run (again, without much pass quality information included and only a single season’s worth of data) I find Ali Al Habsi, Ben Foster, Joe Hart, Petr Cech and Robert Green to be the top goalies. In limited action (about half the sample size of the guys mentioned above), the young Tim Krul actually outclasses the entire collection of goalies in the data set. With the little that I know about the EPL, these seem pretty reasonable and Krul looks like he could live up to the high praise he received after filling in this past year.

     

    As for defenders, the model finds the highly regarded Manchester United captain Nemanja Vidic ranked lower than one might expect, with John Terry up near the top.  Strikers are led by a familiar bunch with the likes of Rodallega, Odemwingie, Tevez, van Persie, Berbatov and Drogba to name a few; but the popular Wayne Rooney comes in between #25 or #30.  While there is plenty of room for improvement, the rankings correlate relatively closely with the EA Sports Index found at the Barclays EPL website.

     

    With respect to importance of events, the model finds Goals to be the biggest game changers, with “Sub-Ins” as the smallest. This makes sense to me. Also keep in mind that the cross-tabulations below are not conditional on field location or game state, which is why we see such low importance of common events like passes (most passes are marginal and near the midfield line). Lastly, I do not indicate directional changes in probability, just the swing from one team to another in absolute value.  

     

    Obviously, the approach would be improved with proper treatment of pass quality and pass difficulty information—which StatDNA does keep track of—and there is still much to account for in my model. Of course, I’d love to hear some feedback on improving things. Overall, I think it’s a pretty good start and I enjoyed getting a chance to work with this data. Thank you to StatDNA for allowing me to share my thoughts here.

     

     

    Event Type

    Sample Size

    Change in Win Prob.

    Goal

    367

    37.25%

    Penalty

    33

    36.67%

    Save tip

    29

    12.15%

    Goalie deflection (non-save)

    132

    10.22%

    Red Card

    27

    7.47%

    Yellow Card

    395

    6.76%

    Goalie Throw

    1019

    5.58%

    Aerial Challenge Missed

    269

    5.06%

    Dribble Sequence

    1564

    4.78%

    Goal Kick

    2410

    4.69%

    Goalie Punt

    865

    4.49%

    Corner

    1308

    4.37%

    Free Kick

    3407

    4.32%

    Shot Foot

    2949

    4.00%

    Goalie Catch (non-save)

    350

    4.00%

    Goalie Punch (non-save)

    186

    3.95%

    Offside

    522

    3.83%

    Foul

    2941

    3.83%

    Lost Possession

    1037

    3.64%

    Goalie Possession

    1699

    3.63%

    Tackle Won

    6163

    3.44%

    Save deflection

    410

    3.43%

    Block (non-goalie)

    2776

    3.39%

    Save catch

    354

    3.34%

    Pass Air

    18017

    3.32%

    Clearance

    6183

    3.22%

    Shot Head

    562

    2.86%

    Deflection

    3073

    2.69%

    Throw in

    5760

    2.66%

    Failed Control

    2513

    2.65%

    Aerial Challenge Lost

    6256

    2.57%

    Head Clearance

    4773

    2.57%

    Gain Possession

    67873

    2.38%

    Cross

    5274

    2.19%

    Pass Head

    11097

    2.12%

    Pass Ground

    67312

    1.89%

    Tackle Lost

    5084

    1.80%

    Sub In

    632

    0.41%

     

     

    Tags:

    Comments (8) -

    Brian Mills
    Brian Mills United States
    9/7/2011 6:56:38 PM #

    Woops!  I just realized there is a typo.  Home Team is RED and Away Team is BLUE.  My apologies.

    Again, thanks for the opportunity to post here.

    Admin
    Admin United States
    9/7/2011 6:58:48 PM #

    typo corrected!

    Ford Bohrmann
    Ford Bohrmann United States
    9/8/2011 9:04:38 PM #

    Very, very interesting. I've been working on a similar idea for just goals which calculates the expected points added per goal based on the venue (home/away) resulting score differential, and minute. Here's a link if you are interested: soccerstatistically.blogspot.com/.../...istic.html

    Going in to more than just goals though is much better because it allows you to calculate for more than just goal scorers. It would be ideal to quantify every play based on how much it adds to the win probability. It seems you've done a lot to get pretty close to that. Well done, and I'm interested to see where else you go from here.

    Brian Mills
    Brian Mills United States
    9/9/2011 2:26:32 PM #

    Hi Ford,

    Thanks for the kind words about the post.  I did my best to do a blog literature search because I know there are others interested in this type of analysis.  I apologize that I did not run across yours and cite it here.  I found your posts very interesting as well.  Thank you for linking them here.  I will be sure to have it on my list of places to visit each morning.

    -Brian

    DSMok1
    DSMok1 United States
    9/10/2011 11:52:24 AM #

    I just wanted to comment on what a wonderful data set this is, and also what a good job Brian is doing with it!  Beautiful.

    mark
    mark United Kingdom
    10/18/2011 8:54:58 AM #

    Fascinating subject
    How have you dealt with the scenario where a trailing team is pressing late in a game.They've got lots of possession,so individuals are making passes etc and accruing personal WP,yet the team they are a part of is seeing it's WP ebb away as the clock runs out.

    I take it the event type/change in WP table quotes average values.A red card in the first half hour of a stalemated,evenly matched game is worth more than a goal.I'm also surprised that penalties rate almost on par with goals.....probably sample size issues.

    Are you debiting the keepers for goals conceded.Green and Foster were both relegated and Al Habsi came within 10 minutes of going down as well.

    Agree about the need to use team specific data rather than sample average.

    Good read!
    I've posted dozens of wp type graphs on my blog at

    http://thepowerofgoals.blogspot.com/

    Brian Mills
    Brian Mills United States
    10/21/2011 2:57:25 PM #

    Hey Mark,

    To your question about averages: yes, they are entire sample averages.  There are, of course, issues with presenting these by field location and time left in the game, so I just provided those as an indicator of model validity.  If for whatever reason goals were at the bottom, then I'd be worried about serious issues.  I definitely think improvements can be made with respect to the time dependence of WP shifts.

    I will have to double check the goals conceded question.  I am sure I included this part (at least I hope so!!!), but will double check the data to be sure it is included.

    As to the point about being late in the game: lots of passes around mid-field should be valued extremely low in the model.  If players are passing back and forth easily, then this should not add significant value to their WPA.  This is the value of being able to control for field location; however there are ways to improve upon this from the way I ran the model.

    Lastly, I imagine the penalty shots issue is one of sample size as you say.  What may have happened here is that these happen more often in close games (i.e. the defender did everything he could to stop a sure goal when the ball is in the box).  In that case, if only 25% of those shots go in (pure guess, you would know better than I), but swing WP heavily, it could affect the results you see above.

    Thanks for the comments!

    freelance writers
    freelance writers Turkey
    1/16/2012 10:16:42 AM #

    This is very useful information that supported interesting pictures, I didn't know that scientists can explore football in such way!

    herve leger
    herve leger United States
    2/25/2012 7:28:50 AM #

    Herve leger store offer various kinds of herve leger dress,especially for Herve Leger Kim Kardashian Dress.

    sac louis vuitton
    sac louis vuitton Morocco
    5/15/2012 7:47:03 AM #

    "Femme nue s'essuyant" restera inachevé.La collision a eu lieu sur un pont entre la gare centrale et la gare d'Amsterdam Sloterdijk, à l'ouest de la ville. "Europe à la schlague"La présidente du Front national Marine Le Pen dénonce, elle, le? Lequel a menacé de se présenter contre Jack Lang, voyant dans son éventuelle candidature "une insulte aux Picards".ts, ce qu'avait promis de ne pas faire Mariano Rajoy en campagne, et en serrant encore la ceinture des fonctionnaires, au grand dam de leurs syndicats. sac louis vuittonTour de France des circonscriptionsIl faut dire que Jack Lang n'en finit plus d'étudier une par une les hypothèses qui pourraient s'offrir à lui en vue des élections législatives de 2012. sac vuittonté de Jean-Marie Le Pen. sacs louis vuitton faire une mise au point vendredi 2 décembre. louis vuitton site officiel Le président de l'eurogroupe, Jean-Claude Juncker, l'a annoncé dans la nuit du mardi 29 au mercredi 30?Il a aussi confirmé s'être rendu au Liberia afin de rencontrer un militant nationaliste serbe. louis vuitton pas cher

    Add comment




      Country flag
    biuquote
    • Comment
    • Preview
    Loading