Expected Goals (xG) is a relatively new measure in football. In simple terms, xG tries to calculate the chance/probability that a certain shot will turn into a goal. This video by Dan Altman of North Yard Analytics describes the simplicity of xG. Bobby Gardner of AnalyticsFC sums up xG in 80 words below.
The number of variables that can be analysed have been described to be limitless through posts by bloggers such as 11tegen11, Paul Riley, Bertin and Pleuler. Pleuler in his article describes that ‘theoretically anything that affects a shot’s probability of resulting in a goal belongs in this model’. For example the most common factors used in an xG model are distance and angle of the shot. Now as you can imagine, there are differences even within this. For instance, some people measure distance as where the shot was taken from or even how it was delivered to the location before the shot was taken (such as the posts written by 11tegen11 and Pleuler above). Other factors, may be through-balls, free-kicks, corner kicks, whether it was a header or a normal shot, time of the game and so on. This means the factors available for analysis in a model like this could reach anywhere to 50+ factors I suspect. With saying that, I presume (this has not been tested yet) that there may only be between 5-6 (or even less) variables which if statistically calculated would result in an accurate measurement for xG. This is something I’m hoping to work on in the near future.
There have also been varying methods of which shot types to include and exclude in calculating xG. For example, the post which was taken from Paul Riley’s blog (above) looked at only Shots on Target. I have a slight issue with only using SoT numbers as I think this does not give a true reflection of the team’s & player’s abilities. Nevertheless, here is a link to Riley’s SoT xG model.
How easy is any certain shooting chance?
It depends on a number of factors (which I will discuss below) but essentially, using historical shooting data, one can assign what is called an ”expected goal” value. This is done by grouping similar types of shots together and seeing how often in the past, this type of shot was converted.
What variables are included in my xG model?
The diagram on the left will be used for my model. Other inputs include the following: 1) Shot location (Zone), 2) Type of shot (goal, attempt saved, off target & blocked), 3) Type of event (open play, set-piece, free-kick, fast-break & corner), 4) Body part (footed shot & header). As mentioned above, it would be interesting to see if any of the following variables would be statistically significant in influencing xG: Date of the match, Fixture, Home Team, Away Team, Time of the shot, first and second half shots.
** I have excluded penalties taken from the data. Including penalties in the model will end up screwing the end result. Therefore you will often see the phrase: Non-penalty expected goals or NPxG (for short) used in the analytics community. **
What about the data?
Presently, I use data from the Premier League and MLS. Over the course of a season, there are at least 10,000 shots taken per league on average. From August 2016, I will gather and analyse the Bundesliga as well.
Does the ”expected goal” value change over time?
It does but very minimally. We can adjust this figure at any point in the season to reflect the changes in the league, team and player.
Where can I find some examples of your xG works?
I mostly use Excel (to calculate xG) and Tableau Public (to graph xG).
Anything else to know about xG?
I am not the only blogger working with this type of metric. If you google xG goals you will find many people (each explaining their own model). Michael Caley (the godfather of xG) has by far done the most research on the site Cartilage Free Captain. Other posts which also look at xG can be found here and here. Elsewhere, Martin Eastwood is another great blogger to follow. He wrote a simple article about adding the y-axis to the x-axis coordinates (on a football pitch) and claims to have calculated xG accurately. He found that his r-squared value increased from 0.86 to 0.95.
Finally, let’s discuss this last post in a little more detail. A blogger named Michael Bertin claimed in one of his posts that ‘if all you know is the distance to the goal from where the shot was taken, you can make a decent xG model’. Now, I am a very sceptical of this statement. For example, let’s use the diagram of the pitch above. If one shot was taken in Zone 5 and the other in Zone 7, Bertin claims that they have the same probability of going in. That may be the case if only distance (up the pitch) is measured but as you can image, the angle changes this probability quite significantly.