"We had so many chances in the game, so many opportunities to get a shot on target. I was starting to wonder if it was going to be one of those days." -- Manchester United's Michael Carrick, after his side's 2-0 victory away to QPR in December
A common footballing frustration. So many chances, not much to show for it. But not all chances are created equal. At StatDNA, we've developed models to quantify just how good a chance really is. Last year we introduced our Goal Creation Framework, a decomposition of how goals are created, which includes pieces for Shot Quality and Finishing Quality. In this post we'll focus on the finishing portion. In particular, we'll take a look at shots that manage to hit the target. Carrick was right to emphasize the importance of getting a shot on target, as that in itself is no mean feat. Only about a third of non-headed shots from open play manage to hit the target.
But what about those that do find their way on net? We would like to quantify the likelihood that a shot on target goes in. We asked ourselves what characteristics influence great finishing. Then we started measuring them in great detail. Now we can give you a glimpse of our results.
For this analysis we selected non-headed shots on target from the first few weeks of the 2011-2012 Barclay's Premier League Season, excluding penalty shots and shots directly from free kicks. The resulting sample is 508 shots from open play. Of these, 134 were goals, resulting in a strike rate of 26.4%.
The most simple thing one could do is say every shot on target has a 26% chance of going in, but that does not seem very reasonable. Instead, the probability of scoring will change based on a number of factors. For example, the distance the shooter is from goal has a negative impact on the odds of scoring a goal. We built a model that estimates scoring probability using a number of features we collect.
We might expect that a good chunk of shots have a very low chance of scoring, producing little work for the goalkeeper along with a smaller group of "unstoppable" shots -- when Ian Darke exclaims, "There was nothing the 'keeper could do about that one!" The figure below is a ranking of shots in our sample, sorted by estimated scoring probability from our model. The black points on the top and bottom represent the actual outcome (goals on top, misses on bottom).

The first thing you will notice on the chart is that nearly half the finishes have a probability of being a goal very near to zero, meaning these shots were almost certain to be stopped. In actuality, the three black dots at the top of the chart tell you that three goals resulted from these first 250 shots. So our model fit is quite good on these shots (as it is more broadly).
At the top of the chart are the blue points, which are shots that are virtually unstoppable, with scoring probability of 95% or greater. In fact, you'll notice at the bottom of the charts, where the black points represent saves, the last black point is around shot number 450. That means of the 50 shots rated as best by our finishing models, every single one of them resulted in a goal.
The green points are perhaps the most interesting, as these allow us to get some insight on goalkeeping. Say you want to judge a goalkeeper on his shot-saving ability (but one piece in the larger goalkeeper puzzle -- stay tuned for more work there), there is actually only a narrow range on which to do it. Save percentage, a common yardstick currently used, is based on all shots on target. But our analysis shows that 60% of shots on target can either be saved by any keeper or have no chance whatsoever of being saved. If we look at how a goalkeeper performs on 50/50 shots, the green range, we can start to see some differentiation, where a great shot-stopper will show his value relative to an average one. Given the small percentage of shots that really matter for keepers, its completely plausible that the traditional save percentage of an inferior keeper would exceed that of an excellent keeper, simply because the inferior keeper faced a higher proportion of completely savable shots.
One might also interpret this analysis to say that goalkeepers may be undifferentiated in their ability to save shots, but there are three important counterarguments:
1. These are all Premier League goalkeepers and thus could be fairly similar in ability. We may find that goalkeepers from lower leagues have a higher percentage of shots in the 50/50 zone.
2. We only consider actual goalkeeper positioning at the time of the shot, not what his optimal position would have been. Therefore, a shot that might have been "unstoppable" given the goalkeeper's actual position could have been stoppable had his positioning been optimal. Our model will not consider that.
3. Goalkeepers may also influence the rate of shots that are on target via their positioning. A 'keeper that does well to close down angles may cause more off target shots by the opponent. However, this ability will show up neither in save percentage nor in our finishing measure above, as both only consider on-target shots.
A complete picture of goalkeeper performance would include not only his ability to make saves - given the finishing quality - but also an analysis of his positioning vs. optimal, control over rebounds, ability to deal with crosses and distribution.
And to return to the question of great finishing, the simplest indicator of this seems to be scoring a goal. The occasions in which a keeper truly robs or gifts a goal are fewer than one might imagine.