@brandymon

brandymon@alien.top · 10 months ago

So xG models football as a random process. Doing this, we can predict the chance of a goal being scored for every shot taken. That probability is calculated based on things like where the shot was taken from, and what part of the body the shot was taken with.

To get the xG for a team or player, we simply add up the probabilities of all their individual goal scoring opportunities, because on average, if football were a random process, that’d be how many goals they’d score. This gives us a way of measuring the quality of goal scoring chances a given team or player is presented with.

Of course, real football isn’t a random process, and there are players or teams that can consistently perform better than the model predicts (e.g. Kane, Son, Haaland), or worse (Chelsea). However, these models do fit on average, and they can richen our tactical understanding of the game if applied correctly.

The maths nerds amongst you may ask how we calculate that probability. This is typically done with a logistic regression on loads of data from competitive matches. The output of this is a function relates the log-odds of a goal being scored to a linear combination of some input variables, which we can then validate against another data set to make sure the model fits. Those input variables change from model to model, and how many variables you can usefully fit will depend on how large your training and validation data sets are, but the ones I mentioned earlier are a solid starting point. Extended models can incorporate things like the position of the keeper and defenders, or where the shot was placed (xGOT).

For anyone who’s interested in getting really nerdy about this, here’s a freely available course on football analytics, though you’ll need some statistical and coding knowledge to follow along - https://soccermatics.readthedocs.io/en/latest/