Wooly Jumpers For Goalposts: 2015

Friday 3 April 2015

Liverpool: Formation change to form change

Back when I was still wet behind the ears when it came to all this football analytics malarkey (and perhaps I still am…), about this stage of Brendan Rodgers’ first season in charge I had a look at how some Liverpool’s attacking metrics had changed over the season. After the season had finished I did the same thing from a defensive point of view.

Overall, it was a season that started off pretty insipidly, with poor results on the back of sterile possession, very little creativity, poor shot conversion and an error strewn defence. Then pretty suddenly there was a change, and Liverpool very quickly became one of the best attacking teams in the league and even the defence started to show some solidity. Sound familiar?

I’m sure you all know that Liverpool switched to a 3421 formation in the first of their games against Manchester United this season, although you may not remember that it was Liverpool’s 16^th game of the season. Prior to that, Liverpool had a string of poor performances and disappointing results. Whilst Liverpool lost 3 goals to nil against United, and needed a very late equaliser against Arsenal in their next game, the performances were significantly better, and Liverpool went on a 13 game unbeaten run, which included 10 wins, finally losing in their last match, again against Manchester United.

Since those early posts I’ve built on the metrics I used, and have combined attacking and defensive metrics together, so as this has been a season split into two very different halves for Liverpool, I thought it might be interesting to do a similar analysis with the new metrics to see just how much their underlying performance has changed. I’m not the first to do this, Dan Kennett has written a piece over on the Tomkins Times, but I’ll be doing mine in a different way, so will hopefully still be interesting.

Like those early posts, I will look at the metrics on a cumulative basis, and will again show the 6-game moving average so that we can see more clearly how Liverpool’s form has changed over the season. I’ve collected data for four seasons now, and so to add some context, I’ll also show what the average has been for the Champions, the 4^th placed teams, and the 18^th placed teams.

I’ll start by looking at good old TSR. For those that don’t know a team’s TSR is the proportion of shots that a team has taken in all of its games, it correlates very well with points won and goal difference.

From the graph we can see that in terms of TSR, Liverpool’s season didn’t actually start off too badly, after nine games Liverpool’s TSR was at 0.63, the kind of level that we would expect of a team fighting to get into the Champions League places at the very least. However from that point Liverpool’s 6-game average TSR took a nosedive, and a run of games that included losses to Newcastle, Chelsea and Crystal Palace, as well as a scoreless home draw against Sunderland saw Liverpool’s TSR form go south of 0.5 and was approaching the level we might expect to see from a team fighting against relegation. Two games later, after the matches against Manchester United and Arsenal, Liverpool’s TSR form is back up to where we would expect it to be where it has more or less remained since.

As you can see from the graph above, there is very little difference between the domination of shots on average for the team that finishes 4^th and the team that ends up as champions. The thing that separates them is the quality of chances that the champions create and their defensive strength, so next I’m going to look at Liverpool’s CEDE Score over the season so far.

CEDE Score is a metric I put together to measure a teams’ efficiency at both creating good quality chances and restricting their opposition from creating good quality chances. To calculate it, I add a team’s Creative Efficiency to their Defensive Efficiency, a bit like PDO for those of you who know what that is. Now, to calculate those two metrics, I use Opta’s Big Chance stat, which is a chance that one would expect a player to have a good likelihood of scoring from, usually because that player does not face any defensive pressure, such as in a 1-on-1 with the keeper or a free header. Creative Efficiency is the percentage of a team’s shots which are Big Chances, whilst Defensive Efficiency is the percentage of shots against which are not Big Chances. I’ve explored CEDE Scores in a bit more detail here, but it’s probably useful to know that the average CEDE Score for a team is 1.0.

So what do we see? Whilst Liverpool’s season from a shot domination perspective has actually been ok apart from that short term blip, their efficiency at creating and restricting Big Chances at the beginning of the season was nothing short of awful. After 12 games Liverpool’s CEDE score was below 0.9. This compares to the average for the 18^th placed team of 0.97, and the lowest over a whole season in my sample was 0.92. After game 12, their 6-game CEDE Score form started increasing, and reached a peak of 1.11 at game 23 before it started dropping back down towards the average. Despite the improvement, over the season as a whole so far Liverpool’s CEDE Score is still only at 0.97.

Although the CEDE Score is good for looking at how efficient teams are at when it comes to Big Chances, it is missing a key ingredient, which is the volume of Big Chances. In my last piece I showed that the Big Chance Ratio (“BCR”) is pretty good at explaining what has happened as it has a higher correlation with points scored than TSR. The BCR is calculated in the same way as TSR but with Big Chances only. As we can see from the graph below, although Liverpool enjoyed decent shot domination in the games at the beginning of the season, it was their CEDE Score which was the driving force behind their BCR, which was equally poor. For the first 10 or so games the BCR was hovering around the 0.4 mark, essentially relegation fighting territory, but when the shots started to dry up the BCR dropped with it down to around the 0.3 mark and a 6-game average low of 0.26. Although it was only 6 games, and I’m sure that there has been plenty of occasions when teams have put together a worse run than that over 6-games, but the worst BCR over a season was 0.3. Then from the Man Utd game we see the rapid increase, and 7 games later the 6 game form BCR had shot up to 0.8 and peaked at 0.88, where over 6 games Liverpool created 13 Big Chances and gave up only 2.

The final metric I’m going to look at is one I also introduced in greater detail in my last piece, and one I call CQR+ (this is an acronym for Chance Quality Ratios Added Together). In the Premier League about half of all goals come from Big Chances, with the other half coming from what I term Normal Chances. A team’s CQR+ is their BCR and their Normal Chance Ratio (“NCR”) added together. CQR+ has a slightly stronger relationship to points than the BCR, however it is a much more repeatable metric, which means that it is a better indication of a team’s actual strength.

So what is the story of Liverpool’s season? For the first 9 or so games, Liverpool’s level of play was essentially about average, they had strong shot domination but weak efficiency rates. Over the next 6 games, Liverpool’s form had dropped to the level that one would expect of a team at the bottom of the league on the back of falling shot numbers and still weak efficiency. Then Liverpool switched to 3421 and 6 games later Liverpool’s form was what you would expect to see from a team trying to win the title, their shot domination had increased back to what you might expect, more importantly, their efficiency also improved. Over the last few games Liverpool’s form has dropped, however over this run of games they’ve played Manchester City, Manchester United, Spurs, and Southampton, so a cooling down of the numbers is perhaps not surprising.

Following the loss to rivals Manchester United, and with only eight games remaining, Liverpool now only have an outside chance of qualifying for the Champions League, and after the United game and the Swansea game before that, some are saying that Liverpool’s 3421 has been found out. This may be so, or perhaps they came up against two teams with the players and the discipline to be able to counter it. Either way, Rodgers has shown that he can get this team performing to a very high level, and that he will change things if needs be, and if he can get Liverpool's performance levels back to where they were just a few games ago, it could take the battle for 4^th to the wire, however Liverpool fans may well ebd up looking back at the first half of the season and think what might have been.

Friday 20 February 2015

Creating some new metrics using Opta's Big Chance stat - Part 2

It has been a while since I have written anything. This is partly due to life just getting that little bit busier, both at work and at home, but also because, after one too many spillages, they keyboard on my laptop stopped working. This is a piece that I actually had 95% completed prior to the keyboard giving up the ghost on me, so I didn't quite get it done, and in this business your data becomes out of date very quickly! This was meant to be posted soon after I presented at the OptaPro Forum last February, and is a follow up to the last piece I wrote that you can read here.

Recently I posted the slides from my presentation at the Opta Forum, and in this I will look at on one of the metrics that I introduced in the presentation, the Big Chance Ratio (“BCR”). If you don’t know already, Big Chances are one of Opta's few subjective stats, described as “A situation where a player should reasonably be expected to score usually in a one-on-one scenario or from very close range.” Big Chances have only been measured by Opta for 4 full seasons now, and this gives us 80 observations to check the relationship with points, but due to this season not yet being completed and teams getting relegated, we only have 51 observations to test the year-on-year relationship. I wrote in more detail here about how many Big Chances a team gets on average over the season and the rate at which they are converted at here, and there has been little change, with the average team over the past 4 seasons taking 535 shots, with about 13%, or an average of 68 of which are Big Chances.

As I did in my last post, I am going to be using the Total Shot Ratio (“TSR”), which measures the proportion of shots that a team takes compared to its opposition as the baseline to compare the different metrics. Although we have many more observations for TSR, I will use the same period to compare the differences between the metrics so that I am comparing like for like.

Below are the two charts showing graphically TSR’s relationship to points and TSR year-on-year, the R² for each are 0.65 and 0.70 respectively, and it’s against these that I’ll be comparing the new metrics.

Moving on to the BCR, the graph below is the one I used in the presentation to show the relationship between the BCR and points, although with last season’s data also included. I don’t think there is anything ground breaking in looking at the BCR, as it is essentially using the same method used for TSR, but applying it to Big Chances. I have seen the BCR used by others to compare teams, although as far as I am aware, no one has written about it before to show just how meaningful it can be.

The R² was 0.75 over the 4 full seasons that Big Chances have been recorded by Opta. Just like TSR, the average team has a BCR of 0.5, but you can see from the graph that the range in BCRs is larger, from the 0.3 achieved by Reading two seasons ago up to the 0.77 for Manchester City also from two seasons ago.

So, why do Big Chances, with an average of only 68 per team each season have such a strong relationship with points won? Well, it is partly be a case of correlation rather causation. Teams that are winning tend to be more conservative and sit back, restricting the opposition to more difficult shots, whilst also being able to hit teams on the break to create better chances or be more patient and wait for easier opportunities to come along. As I showed in my presentation, this comes through when looking BCRs by game state, teams that are winning by 1 goal on average have a BCR of 0.53, and this increases with each goal the lead increases by, and needless to say that teams that are in the lead more tend to win more points. However, as shown by Mark Taylor (here), the ability to create better chances can also be an important factor in who wins the game, even if the expected goals for each team in a game equal the same.

But what about repeatability, is there ‘skill’ in a team’s ability to be able to both create and restrict Big Chances? Well, the graph below shows there is a positive relationship between the BCR in one year and the following year, but this is not a strong as for TSR, with the R² for BCR at 0.60 as opposed to 0.70 for TSR. However, as one of the strengths of TSR is that there are a large number of shots, we should remember that by looking at Big Chances only, we have significantly reduced the number of observations, and when you consider this, the repeatability is actually quite high.

There are still a lot of shots left over however, so how much information is there in shots which aren’t Big Chances? I'll refer to these as Normal Chances, and to give a bit of extra detail, whilst Big Chances are converted at a rate of about 38%, Normal Chances are converted at a rate of slightly over 5%. Well, from the graphs below, we can see that the relationship between the Normal Chance Ratio ("NCR") and points is not as strong as for BCR, whilst the year on year correlation is slightly higher, with an R² of 0.55 and 0.63 respectively. As you would expect with Normal Chances making up 87% of all shots, the range is similar to what we see for TSR, going from 0.36 up to the 0.67. The average NCR, as with both TSR and BCR, is also 0.5.

As about 50% of goals are scored from Big Chances, with of course the other 50% from Normal Chances, I thought it would make sense to see what happens if we add the team’s BCRs and NCRs together. As they both have an average of 0.5 across all teams, the combined metric will have an average of 1 so it will also be nice and easy to tell which teams are above or below average. Of course this metric needs a confusing name and acronym, and as it is two ratios based on the quality of chances added together, I’ve called it Chance Quality Ratios Added Together (“CQR+”).

We can see from the graph that the relationship between CQR+ and points is very strong, and has an R² of 0.78. The reason for the strong improvement over TSR I think can be explained by thinking of TSR as a weighted average of the BCR and NCR, and by separating them out and adding them back together, we have given each an equal weighting, which is in line with their average contributions to goals. As Normal Chances make up the vast majority of a team’s shots, then their TSR and NCR will always be relatively close. If a team is more efficient at creating and restricting Big Chances than they are at shots in general, then their BCR will be higher than their TSR, whilst their NCR will lower, however the change in BCR will be larger in absolute terms, and the higher conversion rate associated with Big Chances should in general translate into more points won.

How about the repeatability? The addition of BCR and NCR together also has a positive effect on repeatability, and the R² 0.75 is actually higher than for TSR which is 0.70 over the same period.

To summarise the differences between the four metrics that I have covered, the table below shows the R² for each one. As we know, TSR is a good predictor of points and is repeatable, BCR has a stronger relationship to points than TSR, but is not as stable year-on-year, but by adding the team's BCR and NCR together we have a metric with both a higher explanatory power and a greater predictive power.

I thought it would be interesting to check how teams are performing by these metrics this season so far. The table below shows each teams TSR, NCR, BCR and CQR+, with the ranking in the league for each metric, and the table has been sorted by CQR+.

If we look at TSR compared to league position we can see that, as we might expect, it is performing relatively well, with the majority of TSR rankings within 3 places of the league position. We can also see how the NCR is never more than 2 decimal points different from the TSR, although even these small changes do shuffle the rankings a little. Its when we start to look at the BCRs that we start to see the real differences. Chelsea have the highest BCR at 0.71, on the back of being the meanest defence at conceding Big Chances and creating the 2^nd most (behind Arsenal), which compares to their TSR of 0.61. Moving in the other direction we have Liverpool, who have been good at dominating the shot count in their matches and have the 4^th highest TSR of 0.59, however they are not as efficient when it comes to Big Chances and have a BCR of 0.51, ranking them 8^th.

In terms of the CQR+ metric, and on the back of their high BCRs, Chelsea and Arsenal are the leaders of the pack. Man City have been the most dominate team in terms of TSR this season, but like Liverpool they are not efficient when it comes to Big Chances so rank 3^rd by CQR+. Then come Southampton, Liverpool, Manchester United and a bit of a gap to Tottenham, meaning that the top 7 by CQR+ make up the top 7 teams in the league. Down at the other end, 5 of the bottom 6 teams in the league make up the 5 worst CQR+ teams, so it does seem to be working on first sight, and overall, all but 3 of the teams’ CQR+ rank are within 3 places of where they are in the league.

That does mean there are 3 outliers however. The first two are West Ham and Swansea, who are outperforming their CQR+, where they rank 14^th and 15^th respectively. For both teams, particularly West Ham, it may well be the case that their numbers are being affected by game states. West Ham have so far spent the 5^th highest amount of time winning in the league this season which is likely to be having some downward pressure on their shot numbers, and whilst Swansea haven’t spent as much time winning, they did start the season very strongly. Going in the other direction, by far the biggest underperformance compared to their CQR+ are QPR, currently sitting outside of the relegation positions by goal difference, yet ranked 12^th in TSR, and 10^th in CQR+ thanks to having the 7^th best BCR in the league. Have they been a little unfortunate, or is it the consequence of playing in very open matches when you are not actually very good? I’m afraid I haven’t seen enough of them to know.

I’m not too sure how this metric stacks up against some of the others out there, particularly the expected goals model, although I did use it to enter a prediction in Simon Gleave's Premier Leagueprediction analysis though, which is kindly updated by James Grayson through out the season to show how everyone is doing, and where it is performing relatively well (by points at least, although not by position).

However, compared to some of the models out there, it has the benefit of being very simple to calculate. All you need is the total shots taken and faced by each team, and the total number of Big Chances they have taken and faced. Unfortunately Big Chance isn’t as readily available as most other stats, you can get it from FantasyFootballScout where you have to pay a subscription, and I have also recently been directed to the AllThingsFPL website which also has them, although having said that the two sites do show slight differences in the Big Chances for each team here and there. I do know that Big Chances do get reviewed in the week following the game which may account for the differences, but I don’t know which site is the most up to date (I have been using the FantasyFootballScout numbers).

I hope that that I have shown here, as well as in my previous work just how useful the Big Chance stat can be and how we can use it to make some simple metrics. Due to not having a precise definition of what a Big Chance actually is, the lack of detailed information on all the Big Chances, as well as its subjective nature, some in the ‘fanalyst’ community have their doubts about Big Chances. Whilst I agree that there may be cases where a shot is recorded as a Big Chance when possibly it should not, that is always going to be the case when subjectivity is added, however I believe that these will be in the minority, and as we see consistency each season with the number of Big Chances and their conversion rates, they are not having a big impact. I think that we should embrace subjective stats as they can add more context to our analysis, and the benefits can outweigh the concerns.