Friday 3 April 2015

Liverpool: Formation change to form change

Back when I was still wet behind the ears when it came to all this football analytics malarkey (and perhaps I still am…), about this stage of Brendan Rodgers’ first season in charge I had a look at how some Liverpool’s attacking metrics had changed over the season. After the season had finished I did the same thing from a defensive point of view.

Overall, it was a season that started off pretty insipidly, with poor results on the back of sterile possession, very little creativity, poor shot conversion and an error strewn defence. Then pretty suddenly there was a change, and Liverpool very quickly became one of the best attacking teams in the league and even the defence started to show some solidity. Sound familiar?

I’m sure you all know that Liverpool switched to a 3421 formation in the first of their games against Manchester United this season, although you may not remember that it was Liverpool’s 16th game of the season. Prior to that, Liverpool had a string of poor performances and disappointing results. Whilst Liverpool lost 3 goals to nil against United, and needed a very late equaliser against Arsenal in their next game, the performances were significantly better, and Liverpool went on a 13 game unbeaten run, which included 10 wins, finally losing in their last match, again against Manchester United.

Since those early posts I’ve built on the metrics I used, and have combined attacking and defensive metrics together, so as this has been a season split into two very different halves for Liverpool, I thought it might be interesting to do a similar analysis with the new metrics to see just how much their underlying performance has changed. I’m not the first to do this, Dan Kennett has written a piece over on the Tomkins Times, but I’ll be doing mine in a different way, so will hopefully still be interesting.

Like those early posts, I will look at the metrics on a cumulative basis, and will again show the 6-game moving average so that we can see more clearly how Liverpool’s form has changed over the season. I’ve collected data for four seasons now, and so to add some context, I’ll also show what the average has been for the Champions, the 4th placed teams, and the 18th placed teams.

I’ll start by looking at good old TSR. For those that don’t know a team’s TSR is the proportion of shots that a team has taken in all of its games, it correlates very well with points won and goal difference.

From the graph we can see that in terms of TSR, Liverpool’s season didn’t actually start off too badly, after nine games Liverpool’s TSR was at 0.63, the kind of level that we would expect of a team fighting to get into the Champions League places at the very least. However from that point Liverpool’s 6-game average TSR took a nosedive, and a run of games that included losses to Newcastle, Chelsea and Crystal Palace, as well as a scoreless home draw against Sunderland saw Liverpool’s TSR form go south of 0.5 and was approaching the level we might expect to see from a team fighting against relegation. Two games later, after the matches against Manchester United and Arsenal, Liverpool’s TSR form is back up to where we would expect it to be where it has more or less remained since.

As you can see from the graph above, there is very little difference between the domination of shots on average for the team that finishes 4th and the team that ends up as champions. The thing that separates them is the quality of chances that the champions create and their defensive strength, so next I’m going to look at Liverpool’s CEDE Score over the season so far.

CEDE Score is a metric I put together to measure a teams’ efficiency at both creating good quality chances and restricting their opposition from creating good quality chances. To calculate it, I add a team’s Creative Efficiency to their Defensive Efficiency, a bit like PDO for those of you who know what that is. Now, to calculate those two metrics, I use Opta’s Big Chance stat, which is a chance that one would expect a player to have a good likelihood of scoring from, usually because that player does not face any defensive pressure, such as in a 1-on-1 with the keeper or a free header. Creative Efficiency is the percentage of a team’s shots which are Big Chances, whilst Defensive Efficiency is the percentage of shots against which are not Big Chances. I’ve explored CEDE Scores in a bit more detail here, but it’s probably useful to know that the average CEDE Score for a team is 1.0.

So what do we see? Whilst Liverpool’s season from a shot domination perspective has actually been ok apart from that short term blip, their efficiency at creating and restricting Big Chances at the beginning of the season was nothing short of awful. After 12 games Liverpool’s CEDE score was below 0.9. This compares to the average for the 18th placed team of 0.97, and the lowest over a whole season in my sample was 0.92. After game 12, their 6-game CEDE Score form started increasing, and reached a peak of 1.11 at game 23 before it started dropping back down towards the average. Despite the improvement, over the season as a whole so far Liverpool’s CEDE Score is still only at 0.97.

Although the CEDE Score is good for looking at how efficient teams are at when it comes to Big Chances, it is missing a key ingredient, which is the volume of Big Chances. In my last piece I showed that the Big Chance Ratio (“BCR”) is pretty good at explaining what has happened as it has a higher correlation with points scored than TSR. The BCR is calculated in the same way as TSR but with Big Chances only. As we can see from the graph below, although Liverpool enjoyed decent shot domination in the games at the beginning of the season, it was their CEDE Score which was the driving force behind their BCR, which was equally poor. For the first 10 or so games the BCR was hovering around the 0.4 mark, essentially relegation fighting territory, but when the shots started to dry up the BCR dropped with it down to around the 0.3 mark and a 6-game average low of 0.26. Although it was only 6 games, and I’m sure that there has been plenty of occasions when teams have put together a worse run than that over 6-games, but the worst BCR over a season was 0.3. Then from the Man Utd game we see the rapid increase, and 7 games later the 6 game form BCR had shot up to 0.8 and peaked at 0.88, where over 6 games Liverpool created 13 Big Chances and gave up only 2.

The final metric I’m going to look at is one I also introduced in greater detail in my last piece, and one I call CQR+ (this is an acronym for Chance Quality Ratios Added Together). In the Premier League about half of all goals come from Big Chances, with the other half coming from what I term Normal Chances. A team’s CQR+ is their BCR and their Normal Chance Ratio (“NCR”) added together. CQR+ has a slightly stronger relationship to points than the BCR, however it is a much more repeatable metric, which means that it is a better indication of a team’s actual strength.

So what is the story of Liverpool’s season? For the first 9 or so games, Liverpool’s level of play was essentially about average, they had strong shot domination but weak efficiency rates. Over the next 6 games, Liverpool’s form had dropped to the level that one would expect of a team at the bottom of the league on the back of falling shot numbers and still weak efficiency. Then Liverpool switched to 3421 and 6 games later Liverpool’s form was what you would expect to see from a team trying to win the title, their shot domination had increased back to what you might expect, more importantly, their efficiency also improved. Over the last few games Liverpool’s form has dropped, however over this run of games they’ve played Manchester City, Manchester United, Spurs, and Southampton, so a cooling down of the numbers is perhaps not surprising.

Following the loss to rivals Manchester United, and with only eight games remaining, Liverpool now only have an outside chance of qualifying for the Champions League, and after the United game and the Swansea game before that, some are saying that Liverpool’s 3421 has been found out. This may be so, or perhaps they came up against two teams with the players and the discipline to be able to counter it. Either way, Rodgers has shown that he can get this team performing to a very high level, and that he will change things if needs be, and if he can get Liverpool's performance levels back to where they were just a few games ago, it could take the battle for 4th to the wire, however Liverpool fans may well ebd up looking back at the first half of the season and think what might have been.

Friday 20 February 2015

Creating some new metrics using Opta's Big Chance stat - Part 2

It has been a while since I have written anything. This is partly due to life just getting that little bit busier, both at work and at home, but also because, after one too many spillages, they keyboard on my laptop stopped working. This is a piece that I actually had 95% completed prior to the keyboard giving up the ghost on me, so I didn't quite get it done, and in this business your data becomes out of date very quickly! This was meant to be posted soon after I presented at the OptaPro Forum last February, and is a follow up to the last piece I wrote that you can read here.

Recently I posted the slides from my presentation at the Opta Forum, and in this I will look at on one of the metrics that I introduced in the presentation, the Big Chance Ratio (“BCR”). If you don’t know already, Big Chances are one of Opta's few subjective stats, described as “A situation where a player should reasonably be expected to score usually in a one-on-one scenario or from very close range.” Big Chances have only been measured by Opta for 4 full seasons now, and this gives us 80 observations to check the relationship with points, but due to this season not yet being completed and teams getting relegated, we only have 51 observations to test the year-on-year relationship. I wrote in more detail here about how many Big Chances a team gets on average over the season and the rate at which they are converted at here, and there has been little change, with the average team over the past 4 seasons taking 535 shots, with about 13%, or an average of 68 of which are Big Chances.

As I did in my last post, I am going to be using the Total Shot Ratio (“TSR”), which measures the proportion of shots that a team takes compared to its opposition as the baseline to compare the different metrics. Although we have many more observations for TSR, I will use the same period to compare the differences between the metrics so that I am comparing like for like.

Below are the two charts showing graphically TSR’s relationship to points and TSR year-on-year, the R2 for each are 0.65 and 0.70 respectively, and it’s against these that I’ll be comparing the new metrics.

Moving on to the BCR, the graph below is the one I used in the presentation to show the relationship between the BCR and points, although with last season’s data also included. I don’t think there is anything ground breaking in looking at the BCR, as it is essentially using the same method used for TSR, but applying it to Big Chances.  I have seen the BCR used by others to compare teams, although as far as I am aware, no one has written about it before to show just how meaningful it can be.

The R2 was 0.75 over the 4 full seasons that Big Chances have been recorded by Opta. Just like TSR, the average team has a BCR of 0.5, but you can see from the graph that the range in BCRs is larger, from the 0.3 achieved by Reading two seasons ago up to the 0.77 for Manchester City also from two seasons ago.

So, why do Big Chances, with an average of only 68 per team each season have such a strong relationship with points won? Well, it is partly be a case of correlation rather causation. Teams that are winning tend to be more conservative and sit back, restricting the opposition to more difficult shots, whilst also being able to hit teams on the break to create better chances or be more patient and wait for easier opportunities to come along. As I showed in my presentation, this comes through when looking BCRs by game state, teams that are winning by 1 goal on average have a BCR of 0.53, and this increases with each goal the lead increases by, and needless to say that teams that are in the lead more tend to win more points. However, as shown by Mark Taylor (here), the ability to create better chances can also be an important factor in who wins the game, even if the expected goals for each team in a game equal the same.

But what about repeatability, is there ‘skill’ in a team’s ability to be able to both create and restrict Big Chances? Well, the graph below shows there is a positive relationship between the BCR in one year and the following year, but this is not a strong as for TSR, with the R2 for BCR at 0.60 as opposed to 0.70 for TSR. However, as one of the strengths of TSR is that there are a large number of shots, we should remember that by looking at Big Chances only, we have significantly reduced the number of observations, and when you consider this, the repeatability is actually quite high.

There are still a lot of shots left over however, so how much information is there in shots which aren’t Big Chances? I'll refer to these as Normal Chances, and to give a bit of extra detail, whilst Big Chances are converted at a rate of about 38%, Normal Chances are converted at a rate of slightly over 5%. Well, from the graphs below, we can see that the relationship between the Normal Chance Ratio ("NCR") and points is not as strong as for BCR, whilst the year on year correlation is slightly higher, with an R2 of 0.55 and 0.63 respectively. As you would expect with Normal Chances making up 87% of all shots, the range is similar to what we see for TSR, going from 0.36 up to the 0.67. The average NCR, as with both TSR and BCR, is also 0.5.

As about 50% of goals are scored from Big Chances, with of course the other 50% from Normal Chances, I thought it would make sense to see what happens if we add the team’s BCRs and NCRs together. As they both have an average of 0.5 across all teams, the combined metric will have an average of 1 so it will also be nice and easy to tell which teams are above or below average. Of course this metric needs a confusing name and acronym, and as it is two ratios based on the quality of chances added together, I’ve called it Chance Quality Ratios Added Together (“CQR+”).

We can see from the graph that the relationship between CQR+ and points is very strong, and has an R2 of 0.78. The reason for the strong improvement over TSR I think can be explained by thinking of TSR as a weighted average of the BCR and NCR, and by separating them out and adding them back together, we have given each an equal weighting, which is in line with their average contributions to goals. As Normal Chances make up the vast majority of a team’s shots, then their TSR and NCR will always be relatively close. If a team is more efficient at creating and restricting Big Chances than they are at shots in general, then their BCR will be higher than their TSR, whilst their NCR will lower, however the change in BCR will be larger in absolute terms, and the higher conversion rate associated with Big Chances should in general translate into more points won.

How about the repeatability? The addition of BCR and NCR together also has a positive effect on repeatability, and the R2 0.75 is actually higher than for TSR which is 0.70 over the same period.

To summarise the differences between the four metrics that I have covered, the table below shows the R2 for each one. As we know, TSR is a good predictor of points and is repeatable, BCR has a stronger relationship to points than TSR, but is not as stable year-on-year, but by adding the team's BCR and NCR together we have a metric with both a higher explanatory power and a greater predictive power.

I thought it would be interesting to check how teams are performing by these metrics this season so far. The table below shows each teams TSR, NCR, BCR and CQR+, with the ranking in the league for each metric, and the table has been sorted by CQR+.

If we look at TSR compared to league position we can see that, as we might expect, it is performing relatively well, with the majority of TSR rankings within 3 places of the league position. We can also see how the NCR is never more than 2 decimal points different from the TSR, although even these small changes do shuffle the rankings a little. Its when we start to look at the BCRs that we start to see the real differences. Chelsea have the highest BCR at 0.71, on the back of being the meanest defence at conceding Big Chances and creating the 2nd most (behind Arsenal), which compares to their TSR of 0.61. Moving in the other direction we have Liverpool, who have been good at dominating the shot count in their matches and have the 4th highest TSR of 0.59, however they are not as efficient when it comes to Big Chances and have a BCR of 0.51, ranking them 8th.

In terms of the CQR+ metric, and on the back of their high BCRs, Chelsea and Arsenal are the leaders of the pack. Man City have been the most dominate team in terms of TSR this season, but like Liverpool they are not efficient when it comes to Big Chances so rank 3rd by CQR+. Then come Southampton, Liverpool, Manchester United and a bit of a gap to Tottenham, meaning that the top 7 by CQR+ make up the top 7 teams in the league. Down at the other end, 5 of the bottom 6 teams in the league make up the 5 worst CQR+ teams, so it does seem to be working on first sight, and overall, all but 3 of the teams’ CQR+ rank are within 3 places of where they are in the league. 

That does mean there are 3 outliers however. The first two are West Ham and Swansea, who are outperforming their CQR+, where they rank 14th and 15th respectively. For both teams, particularly West Ham, it may well be the case that their numbers are being affected by game states. West Ham have so far spent the 5th highest amount of time winning in the league this season which is likely to be having some downward pressure on their shot numbers, and whilst Swansea haven’t spent as much time winning, they did start the season very strongly. Going in the other direction, by far the biggest underperformance compared to their CQR+ are QPR, currently sitting outside of the relegation positions by goal difference, yet ranked 12th in TSR, and 10th in CQR+ thanks to having the 7th best BCR in the league. Have they been a little unfortunate, or is it the consequence of playing in very open matches when you are not actually very good? I’m afraid I haven’t seen enough of them to know.

I’m not too sure how this metric stacks up against some of the others out there, particularly the expected goals model, although I did use it to enter a prediction in Simon Gleave's Premier Leagueprediction analysis though, which is kindly updated by James Grayson through out the season to show how everyone is doing, and where it is performing relatively well (by points at least, although not by position). 

However, compared to some of the models out there, it has the benefit of being very simple to calculate. All you need is the total shots taken and faced by each team, and the total number of Big Chances they have taken and faced. Unfortunately Big Chance isn’t as readily available as most other stats, you can get it from FantasyFootballScout where you have to pay a subscription, and I have also recently been directed to the AllThingsFPL website which also has them, although having said that the two sites do show slight differences in the Big Chances for each team here and there. I do know that Big Chances do get reviewed in the week following the game which may account for the differences, but I don’t know which site is the most up to date (I have been using the FantasyFootballScout numbers).

I hope that that I have shown here, as well as in my previous work just how useful the Big Chance stat can be and how we can use it to make some simple metrics. Due to not having a precise definition of what a Big Chance actually is, the lack of detailed information on all the Big Chances, as well as its subjective nature, some in the ‘fanalyst’ community have their doubts about Big Chances. Whilst I agree that there may be cases where a shot is recorded as a Big Chance when possibly it should not, that is always going to be the case when subjectivity is added, however I believe that these will be in the minority, and as we see consistency each season with the number of Big Chances and their conversion rates, they are not having a big impact. I think that we should embrace subjective stats as they can add more context to our analysis, and the benefits can outweigh the concerns. 

Friday 7 November 2014

The slides from my OptaPro Conference presentation

Opta have just announced that they are now taking submissions for The OptaPro Analytics Forum 2015 (more details here).

Earlier this year I was lucky enough to have been picked to present at their first conference, so I thought it would be a good time to post the slides.

If anyone out there is thinking of submitting an idea I can highly recommend it (although as someone who doesn't really do that type of thing, it was bloody stressful!). I wasn't going to bother last year as I thought it would never get picked, but decided on the deadline for submissions that I had nothing to lose by giving it a go and I would never get the data to do something like that otherwise.

So, if you've got an idea, get it in!

Friday 3 January 2014

Creating some new metrics using Opta's Clear Cut Chance stat

Having read this from Richard Whitall (here), I thought I’d see if I could come up with any metrics using Opta’s Clear Cut Chance (CCC) that both explains what has been happening in season and that might also be used to predict what may happen going forwards. This would build on the work I have done already looking at how efficient teams are at attacking (here) and defending (here).

I’ll start by adding a team’s Creative Efficiency (CE) and it’s Defensive Efficiency (DE) together, which I’ll call the CEDE Score. As a recap, a team’s CE is calculated by dividing a team’s CCCs for by their total shots, the higher the number the more efficient they are at creating CCCs when they do take a shot. Conversely a team’s DE is calculated by dividing their Normal Chances conceded (ie non CCCs) by their total shots conceded.

The average CE over the previous 3 seasons was 13%, whilst the average DE was 87%, add the two together (in their decimal form) and you get CEDE Score of 1.0 for the average team in the league.  We now have a metric to measure the overall efficiency of a team in terms of creating and restricting CCCs. A team with a CEDE Score of over 1.0 is more efficient than the average team and vice a versa.

As I said at the beginning, I would like to create some metrics that both explain what has happened in matches and that high prediction values. To test their explanatory value I will calculate their relationship with goal difference (GD) and to test their predictive value I will calculate the correlation between the year 1 values achieved by each team with their year 2 values.

Below is the graph of the CEDE Score to GD over the past 3 years. Whilst it would be good to have more, CCC data has only been available for the last 3 full seasons. We can see that there is a positive relationship, a team with the average CEDE Score of 1.0 would be expected to have the same GD as that of the average team, which is zero, whilst a team with a CEDE Score 5% above average at 1.05 would be expected to have a positive GD of about 17 goals. However there is some variance in the data and the R2 is equal to 0.35, which is not bad, but I would like to do better.

As for the repeatability of the CEDE Score, with only 2 seasons where we can measure a team’s performance against the previous season, and as we can only look at nonrelegated teams, we have even less data with only 34 observations. As you can see from the graph below, the relationship is somewhat weaker, and the correlation coefficient is 0.42.

To see how this compares to other footy ‘fancy stats’, then if you haven’t already, I highly recommend you have a look at James Grayson’s blog, who has done a huge amount of work on repeatability of various football stats. The reason for the low score could well be due to the relatively small number of CCCs created each season. The average team only takes and concedes 70 CCCs in a season, so small differences year on year can have a big impact.

One of the main issues with the CEDE Score is that, as an efficiency score, it is only a proportion of total shots and does not take into account how many shots a team take or faces. For example Stoke under Pulis were a team that had fewer shots attempted in their matches than average, they attempted to take their shots from as close to goal as possible, whilst reducing the opposition to shoot form distance shots. Over the last 3 season’s their CEDE Score has been above average, and in the 2012/13 season they actually had the 4th best CEDE Score, however this did not translate into league position.

If you are reading this then there is a good chance that you know about the Total Shot Ratio (TSR), but in case you don’t, TSR measures the proportion of shots that a team has taken. TSR is essentially the go to metric in football if you want to compare teams. It has been shown to be meaningful, repeatable, is simple to understand and calculate. Although you can find a lot more detailed info and data about TSR on James’ blog, I thought it would be useful to see the same graphs over the same time period as what I’m working with for ease of comparison. Over the 3 seasons, TSR to GD has an R2 of 0.66, and the year on year TSR correlation coefficient is 0.86.

One of the main issues with regards to TSR is that it treats all shots as equally and does not take into account chance quality. So, what we already have is a very good metric that does not take into account chance quality, and what I’ve introduced is an ok metric that focuses on the efficiency of a team to create good chances for and restrict good chances against. Lets see what we get when we combine the two…

Handily, as the CEDE Score has an average of 1.0, we can use it as an efficiency multiplier with TSR, I’ll call it CEDE’d TSR (TSRCEDE). The effect should be to normailse TSR so that there are fewer outliers. We can instantly see from the graph of TSRCEDE to GD that the observations are packed closer to the trend line than the TSR graph. The R2 for TSRCEDE is in fact 0.79, which I think is quite a good increase. 

To give some examples of the effect this has had, last season Tottenham were the top ranked TSR team last season with a score of 0.65, their CEDE’d TSR dropped down to 0.60 which ranked 4th. It has often been mentioned that Man Utd broke many a model last season, mainly because their TSR of 0.53 ranked them only 7th, however once CEDE’d it increased to 0.60 and slightly higher than Tottenham’s to rank 3rd. Ok, this still does not quite predict them having the stellar season that they did, but it is a big move in the right direction

In terms of the year on year correlation, the graph looks pretty similar to that of TSR, and its correlation coefficient is 0.87, which is slightly stronger that TSR’s year on year correlation from the same period.

So, how are things looking this season then? Hopefully the table below is pretty self explanatory, its sorted by TSRCEDE and also lists each team’s CE, DE, CEDE Score and TSR, ad shows the rank of the CEDE Score and TSR as well as TSRCEDE.

The top ranked team in terms of CEDE Score is table topping Arsenal, which corresponds to what Colin Trainor has found in the first of his mid-year reviews (here) and recent table toppers Liverpool in 2nd. However surprisingly the next 6 teams are made up of what we would consider weaker sides, West Brom in 3rd, followed by Stoke who have a good CEDE Score again, Sunderland, Hull, Cardiff and Fulham. The reason for this in the main I believe is that these teams, in effect, know their limitations and attempt to pack their defences to reduce the opposition to difficult shots, I’ll be interested to see if Colin’s reviews corroborate these numbers when he comes to looking at these teams. Man City only come in 14th by this measure and actually have a CEDE Score of less than 1.0 due to their poor DE.

I have to say that this is unusual, in general it is the better teams that have the better CEDE Score. What has happened is that 4 of the biggest shooting teams have seen their CE fall significantly. As these 4 teams, Man Utd, Man City, Chelsea and Everton, also changed their manager in the summer, we’ll have to wait and see if this is a short term effect of having a new manager bed in, or in fact due to different tactics employed by those managers. The overall effect is that the average CE for all teams this season has fallen to under 11%.

If we look at the TSRCEDE numbers compared to TSR, we see they are, as you would expect, very similar, however the changes are enough to shuffle the pack a bit. Some of the notable changes are Chelsea becoming the top ranked team from 3rd in terms of normal TSR, Arsenal go from a Utd like 8th ranked TSR team to 5th by TSRCEDE, and Everton do the opposite and go from 5th to 8th. At the bottom of the table we’ve got 5 teams grouped together on a TSRCEDE of 0.43, with Cardiff and then Fulham adrift at the bottom.

I’ll leave it there for now, but hopefully I’ll have a bit of time to be able to a few other metrics I’ve put together using CCCs.

Sources: 2013-14 season from, all earlier data from

Monday 30 December 2013

RACE to Goals Model – The Defence

Prior to the season starting, I introduced the RACE to Goals Model, which you can find here, and I suggest you have a read of that before you continue with this one if you want to have a full description of the different metrics and how they are calculated.
Essentially, I am looking at the same metrics, but this time flipped to a defensive point of view, so the rate of shots conceded, the Defensive Efficiency, and the conversion of chances conceded by each type.
I will describe Defensive Efficiency here though, as it’s calculated slightly differently. Whereas Creative Efficiency attempts to show how good a team is at creating good chances, measured as the proportion of Clear Cut Chances to Total Shots, Defensive Efficiency attempts to show how good a team is limiting the amount of good chances the opposition has, and is measured as the proportion of Normal Chances conceded to Total Shots conceded (%NC). So the higher the number, the lower the percentage of Clear Cut Chances conceded, and the more efficient the defence is.
The benchmark numbers are essentially the same, the slight difference being own goals, and those ‘shots’ by players on the defending team that lead to own goals are also included.
The table below shows how well the teams performed last season against the 4 metrics.

The team that conceded the fewest shots was Tottenham, with only 370 over the entire season, so a touch under 10 shots a game. At the other end of the scale were Reading, who conceded 706 shots, the worst by over 60 shots.
Like with the original article, I feel the raw numbers in the table are a little hard to read, so again I’ll add context and measure each metric as the percentage difference from the benchmark team. From the a defensive point of view, having shots and conversion rates below the benchmark is good, but this is not the case for Defensive Efficiency, so I’ve highlighted this in the table as anything in red as being ‘bad’.

As with Creative Efficiency, Manchester United also had the best Defensive Efficiency, limiting their opponents to only 8.2% of their shots coming from CCCs, with Manchester City being the only other team to have a Defensive Efficiency of over 90%, seeing them perform 6% and 4% better than average respectively. The team with the worst Defensive Efficiency was Newcastle, who allowed over 18.5% of all chances against to be CCCs; however the 2nd worst team, perhaps surprisingly considering how few shots they conceded, was Tottenham, allowing almost 18%, and possibly showing the risk of playing with a high defensive line.
Looking at the conversion rates it becomes clear why Wigan struggled last season. They had by the worst rate of CCCs conceded, in fact at 52.7%, they are the only team over the 3 years of data that conceded more than half the CCCs that they faced. They were also the 2nd worst at stopping Normal Chances being conceded. Reading actually had the best rate when it came to stopping CCCs in the league, but unfortunately for them, when you allow the opposition to create over 100 CCCs in total, you will still concede a lot of goals.
No teams outperformed or underperformed all 4 of the metrics compared to the benchmark. Only 4 teams, the two Manchester clubs, Chelsea and Swansea outperformed on 3 of the benchmarks. Liverpool join Utd, City and Chelsea as the only teams who conceded fewer shots than the benchmark whilst also having a higher than average Defensive Efficiency. Despite conceding the fewest shots, we can see why 7 teams conceded less goals than Tottenham following their underperformance in the 3 other metrics.

Converting the metrics into Expected Goals, we see how badly Wigan performed. Whilst they would have been expected to concede just less than 54 goals from the shots that the opposition had, which was only the 10th lowest, they actually conceded 73 (+19.1 goals more than expected). The other big underperformers were Southampton (+9.7 goals), Newcastle (+9.0 goals) and Aston Villa (+7.9 goals). The biggest overperformers were Everton (-9.5 goals), Sunderland (-8.7 goals), Stoke (-6.4 goals) and Arsenal (-6.0 goals).
In my next posts I will combine some of the attacking and defending metrics together to analyse team’s performances in some new ways, and see how the teams have performed so far this season.
This was originally posted on  EPLIndex

Defending Liverpool's Defence

With the season about to start, I thought I would follow up to piece that I did earlier in the year looking at how Liverpool’s form changed over the season, however whilst that looked at attacking form, this one looks at Liverpool’s defensive form. Again I will look at Liverpool’s performance compared to how the league performed on average, how the top 4 performed, and also compared to Liverpool in the 2011-12 season, as well as having the short term form by having the 6-game moving average. One thing to note is that due to there being fewer observations, for example Liverpool conceded far fewer shots, goals etc., that the graphs show more extreme changes compared to the attacking versions of these graphs
I’ll start by looking at shots conceded per game. Apart from the 18 shots conceded in the first game of the season against West Brom skewing the averages, Liverpool performed more or less in line with the Top 4 teams throughout the season.

In terms of the accuracy of opposing team’s shots, despite the slow start that Liverpool had and perhaps surprisingly, they actually allowed significantly less shots to hit the target compared to the Top 4 teams and the rest of the league over the first half of the season, whilst over the 2nd half of the season, a greater percentage of opponents shots were hitting the target.

Moving on to Opponent Shots Conversion and Shots on Target Conversion, we can see how poorly Liverpool defended and Pepe Reina performed in the opening 5 or 6 games of last season. Basically, Liverpool defended and kept goal more or less like a lower league team when going up against a Premier League side in a cup, but this quickly regressed to the mean, and they performed like a Top 4 team from game 7 onwards (in the moving-average plot, this shows up from match 12). Those first 6 games had such an impact though that the end of season conversion rates were still only in line with the league as a whole.

How do Liverpool, or more pertinently Pepe Reina and Brad Jones, do at keeping out Clear Cut Chances (CCC)? So what is a CCC? It is one of Opta’s few subjective stats that can broadly be described as a chance where the attacker is probably central to goal with only the keeper to beat. So a keeper would hope to either save it, or perhaps attempt to put the attacker off sufficiently that they miss. As I mentioned in the original piece, the conversion rate for CCCs is much more variable than the other conversion rates, this is because in some games there will be few or even no CCCs, which means that both very high and very low single game conversion rates are far more likely, and we see this clearly in Liverpool’s form plot (note that the reason you can’t see the league average plot is because it was the same as the Top 4). Again, Liverpool started poorly, but were better than the Top 4 teams from match 7 onwards, apart from a large peak at match 17 where all the CCCs that Liverpool faced were scored giving a 100% conversion rate. More specifically, it was in fact a 4 match period with the goals coming from Tottenham, West Ham and Aston Villa.

With that in mind, it is interesting to then see the rate at which Liverpool were giving up CCCs per game through the season. Again we see Liverpool started off poorly, giving away on average 1.5 CCCs over the first 10 games, but by match 17, where we saw the 100% conversion rate, the 6-game form had fallen to 0.7 per game. So, it was only 4 out of 4 CCCs conceded in 6 games. As the average conversion rate for all CCCs is around 38%, it is a bit like tossing a coin 4 times and getting 4 heads, so I don’t think we should put it down to poor goal keeping. You’ll notice there is a sharp rise in CCCs conceded from about match 20, but this coincided with an increase with CCCs for Liverpool, and can perhaps be put down to increased attacking leaving the defence more open (Note: Liverpool’s average in 2011-12 was the same as the Top 4’s last season).

Finally I’m going to look at Errors per Game. It should be noted that these are ‘on the ball’ errors, so does not include an error like not marking the run of an opponent from crosses (something that had many Liverpool fans pulling their hair out). Again we see the effect that Liverpool’s poor start to the season had, however it took longer for Liverpool to recover from than compared to the other metrics, but by the end of the season, on the ball errors had almost become non-existent. Over the season as a whole, only Arsenal and Newcastle made more errors than Liverpool’s 36, however if you split the season in half, over the first 19 games Liverpool made 28 errors, over the last 19 games it was only 8. As an on the ball error will often leave the rest of the defence wrong footed, these types of errors tend to have a high conversion rate, and Liverpool conceded 10 goals from the 36 errors they made. If Liverpool can continue to keep the error rate at the level of the 2nd half of the season, then there would be a lot less hair pulled out by the fans this coming season. 

Perhaps it was the tough start Liverpool had, perhaps it was the getting used to Brendan Rodgers system, or perhaps they were just unlucky (probably a combination of all 3), but clearly Liverpool started the season really badly last year. If they can perform defensively as well as they did over the last 30 or so games, they could well turn a few of those losses and score draws into wins, and have a good crack at finishing in the top 4.

I posted this oginally on EPLIndex

Wednesday 14 August 2013

RACE to Goals Model: League Predictions

At the end of my introduction to the RACE to Goals Model, that you can read here, I mentioned that I would like to look at how teams performed from a defensive point of view, and to check  how reliable the metrics are year on year.

Whilst I have done those things, I haven't had the time to sit down and write about them. But I have been able to create a model (well, I created a few slightly different versions and picked what seemed the best) to predict this season's league table, so I will at least get that posted prior to the season starting so that I don't let bias from early results get in the way.

The model is a variation of the Shot Dominance model, as coined by @mixedknuts (here), which is itself a variation of the Total Shot Ratio (TSR) model that has been looked at in legnth by @JamesWGrayson (such as this), and a good summary of TSR by @TheM_L_G can be read here. What differentiates the RACE to Goals Model is that it includes the quality of chance that those metrics are missing based on the metrics from my earlier piece, and I hope to go into more detail in later posts.

For the promoted teams I did not have the data available to do the same analysis, so a simply did a regression of goals scored and conceded in the Championship since 2000 for promoted team compared to points scored the following season in the Premier League.

The model predicts that Man City are the clear favourites for the title, and that there will be another very close battle for 4th place, whilst down at the bottom, Fulham, Newcastle Southampton and Norwich could well be in trouble. Of course the model does not take into account any managerial changes or player transfers, which could change team strengths significantly. Personally, I'd expect Newcastle and Southampton to do a little better and be replaced by Sunderland down towards the bottom, whilst common sense would say that Chelsea should challenge at the top, but other than that, I think the predictions are reasonable.

Anyway, here's the predicted table.