Wooly Jumpers For Goalposts: February 2015

It has been a while since I have written anything. This is partly due to life just getting that little bit busier, both at work and at home, but also because, after one too many spillages, they keyboard on my laptop stopped working. This is a piece that I actually had 95% completed prior to the keyboard giving up the ghost on me, so I didn't quite get it done, and in this business your data becomes out of date very quickly! This was meant to be posted soon after I presented at the OptaPro Forum last February, and is a follow up to the last piece I wrote that you can read here.

Recently I posted the slides from my presentation at the Opta Forum, and in this I will look at on one of the metrics that I introduced in the presentation, the Big Chance Ratio (“BCR”). If you don’t know already, Big Chances are one of Opta's few subjective stats, described as “A situation where a player should reasonably be expected to score usually in a one-on-one scenario or from very close range.” Big Chances have only been measured by Opta for 4 full seasons now, and this gives us 80 observations to check the relationship with points, but due to this season not yet being completed and teams getting relegated, we only have 51 observations to test the year-on-year relationship. I wrote in more detail here about how many Big Chances a team gets on average over the season and the rate at which they are converted at here, and there has been little change, with the average team over the past 4 seasons taking 535 shots, with about 13%, or an average of 68 of which are Big Chances.

As I did in my last post, I am going to be using the Total Shot Ratio (“TSR”), which measures the proportion of shots that a team takes compared to its opposition as the baseline to compare the different metrics. Although we have many more observations for TSR, I will use the same period to compare the differences between the metrics so that I am comparing like for like.

Below are the two charts showing graphically TSR’s relationship to points and TSR year-on-year, the R² for each are 0.65 and 0.70 respectively, and it’s against these that I’ll be comparing the new metrics.

Moving on to the BCR, the graph below is the one I used in the presentation to show the relationship between the BCR and points, although with last season’s data also included. I don’t think there is anything ground breaking in looking at the BCR, as it is essentially using the same method used for TSR, but applying it to Big Chances. I have seen the BCR used by others to compare teams, although as far as I am aware, no one has written about it before to show just how meaningful it can be.

The R² was 0.75 over the 4 full seasons that Big Chances have been recorded by Opta. Just like TSR, the average team has a BCR of 0.5, but you can see from the graph that the range in BCRs is larger, from the 0.3 achieved by Reading two seasons ago up to the 0.77 for Manchester City also from two seasons ago.

So, why do Big Chances, with an average of only 68 per team each season have such a strong relationship with points won? Well, it is partly be a case of correlation rather causation. Teams that are winning tend to be more conservative and sit back, restricting the opposition to more difficult shots, whilst also being able to hit teams on the break to create better chances or be more patient and wait for easier opportunities to come along. As I showed in my presentation, this comes through when looking BCRs by game state, teams that are winning by 1 goal on average have a BCR of 0.53, and this increases with each goal the lead increases by, and needless to say that teams that are in the lead more tend to win more points. However, as shown by Mark Taylor (here), the ability to create better chances can also be an important factor in who wins the game, even if the expected goals for each team in a game equal the same.

But what about repeatability, is there ‘skill’ in a team’s ability to be able to both create and restrict Big Chances? Well, the graph below shows there is a positive relationship between the BCR in one year and the following year, but this is not a strong as for TSR, with the R² for BCR at 0.60 as opposed to 0.70 for TSR. However, as one of the strengths of TSR is that there are a large number of shots, we should remember that by looking at Big Chances only, we have significantly reduced the number of observations, and when you consider this, the repeatability is actually quite high.

There are still a lot of shots left over however, so how much information is there in shots which aren’t Big Chances? I'll refer to these as Normal Chances, and to give a bit of extra detail, whilst Big Chances are converted at a rate of about 38%, Normal Chances are converted at a rate of slightly over 5%. Well, from the graphs below, we can see that the relationship between the Normal Chance Ratio ("NCR") and points is not as strong as for BCR, whilst the year on year correlation is slightly higher, with an R² of 0.55 and 0.63 respectively. As you would expect with Normal Chances making up 87% of all shots, the range is similar to what we see for TSR, going from 0.36 up to the 0.67. The average NCR, as with both TSR and BCR, is also 0.5.

As about 50% of goals are scored from Big Chances, with of course the other 50% from Normal Chances, I thought it would make sense to see what happens if we add the team’s BCRs and NCRs together. As they both have an average of 0.5 across all teams, the combined metric will have an average of 1 so it will also be nice and easy to tell which teams are above or below average. Of course this metric needs a confusing name and acronym, and as it is two ratios based on the quality of chances added together, I’ve called it Chance Quality Ratios Added Together (“CQR+”).

We can see from the graph that the relationship between CQR+ and points is very strong, and has an R² of 0.78. The reason for the strong improvement over TSR I think can be explained by thinking of TSR as a weighted average of the BCR and NCR, and by separating them out and adding them back together, we have given each an equal weighting, which is in line with their average contributions to goals. As Normal Chances make up the vast majority of a team’s shots, then their TSR and NCR will always be relatively close. If a team is more efficient at creating and restricting Big Chances than they are at shots in general, then their BCR will be higher than their TSR, whilst their NCR will lower, however the change in BCR will be larger in absolute terms, and the higher conversion rate associated with Big Chances should in general translate into more points won.

How about the repeatability? The addition of BCR and NCR together also has a positive effect on repeatability, and the R² 0.75 is actually higher than for TSR which is 0.70 over the same period.

To summarise the differences between the four metrics that I have covered, the table below shows the R² for each one. As we know, TSR is a good predictor of points and is repeatable, BCR has a stronger relationship to points than TSR, but is not as stable year-on-year, but by adding the team's BCR and NCR together we have a metric with both a higher explanatory power and a greater predictive power.

I thought it would be interesting to check how teams are performing by these metrics this season so far. The table below shows each teams TSR, NCR, BCR and CQR+, with the ranking in the league for each metric, and the table has been sorted by CQR+.

If we look at TSR compared to league position we can see that, as we might expect, it is performing relatively well, with the majority of TSR rankings within 3 places of the league position. We can also see how the NCR is never more than 2 decimal points different from the TSR, although even these small changes do shuffle the rankings a little. Its when we start to look at the BCRs that we start to see the real differences. Chelsea have the highest BCR at 0.71, on the back of being the meanest defence at conceding Big Chances and creating the 2^nd most (behind Arsenal), which compares to their TSR of 0.61. Moving in the other direction we have Liverpool, who have been good at dominating the shot count in their matches and have the 4^th highest TSR of 0.59, however they are not as efficient when it comes to Big Chances and have a BCR of 0.51, ranking them 8^th.

In terms of the CQR+ metric, and on the back of their high BCRs, Chelsea and Arsenal are the leaders of the pack. Man City have been the most dominate team in terms of TSR this season, but like Liverpool they are not efficient when it comes to Big Chances so rank 3^rd by CQR+. Then come Southampton, Liverpool, Manchester United and a bit of a gap to Tottenham, meaning that the top 7 by CQR+ make up the top 7 teams in the league. Down at the other end, 5 of the bottom 6 teams in the league make up the 5 worst CQR+ teams, so it does seem to be working on first sight, and overall, all but 3 of the teams’ CQR+ rank are within 3 places of where they are in the league.

That does mean there are 3 outliers however. The first two are West Ham and Swansea, who are outperforming their CQR+, where they rank 14^th and 15^th respectively. For both teams, particularly West Ham, it may well be the case that their numbers are being affected by game states. West Ham have so far spent the 5^th highest amount of time winning in the league this season which is likely to be having some downward pressure on their shot numbers, and whilst Swansea haven’t spent as much time winning, they did start the season very strongly. Going in the other direction, by far the biggest underperformance compared to their CQR+ are QPR, currently sitting outside of the relegation positions by goal difference, yet ranked 12^th in TSR, and 10^th in CQR+ thanks to having the 7^th best BCR in the league. Have they been a little unfortunate, or is it the consequence of playing in very open matches when you are not actually very good? I’m afraid I haven’t seen enough of them to know.

I’m not too sure how this metric stacks up against some of the others out there, particularly the expected goals model, although I did use it to enter a prediction in Simon Gleave's Premier Leagueprediction analysis though, which is kindly updated by James Grayson through out the season to show how everyone is doing, and where it is performing relatively well (by points at least, although not by position).

However, compared to some of the models out there, it has the benefit of being very simple to calculate. All you need is the total shots taken and faced by each team, and the total number of Big Chances they have taken and faced. Unfortunately Big Chance isn’t as readily available as most other stats, you can get it from FantasyFootballScout where you have to pay a subscription, and I have also recently been directed to the AllThingsFPL website which also has them, although having said that the two sites do show slight differences in the Big Chances for each team here and there. I do know that Big Chances do get reviewed in the week following the game which may account for the differences, but I don’t know which site is the most up to date (I have been using the FantasyFootballScout numbers).

I hope that that I have shown here, as well as in my previous work just how useful the Big Chance stat can be and how we can use it to make some simple metrics. Due to not having a precise definition of what a Big Chance actually is, the lack of detailed information on all the Big Chances, as well as its subjective nature, some in the ‘fanalyst’ community have their doubts about Big Chances. Whilst I agree that there may be cases where a shot is recorded as a Big Chance when possibly it should not, that is always going to be the case when subjectivity is added, however I believe that these will be in the minority, and as we see consistency each season with the number of Big Chances and their conversion rates, they are not having a big impact. I think that we should embrace subjective stats as they can add more context to our analysis, and the benefits can outweigh the concerns.

Wooly Jumpers For Goalposts

Friday, 20 February 2015

Creating some new metrics using Opta's Big Chance stat - Part 2