Friday 3 January 2014

Creating some new metrics using Opta's Clear Cut Chance stat

Having read this from Richard Whitall (here), I thought I’d see if I could come up with any metrics using Opta’s Clear Cut Chance (CCC) that both explains what has been happening in season and that might also be used to predict what may happen going forwards. This would build on the work I have done already looking at how efficient teams are at attacking (here) and defending (here).

I’ll start by adding a team’s Creative Efficiency (CE) and it’s Defensive Efficiency (DE) together, which I’ll call the CEDE Score. As a recap, a team’s CE is calculated by dividing a team’s CCCs for by their total shots, the higher the number the more efficient they are at creating CCCs when they do take a shot. Conversely a team’s DE is calculated by dividing their Normal Chances conceded (ie non CCCs) by their total shots conceded.

The average CE over the previous 3 seasons was 13%, whilst the average DE was 87%, add the two together (in their decimal form) and you get CEDE Score of 1.0 for the average team in the league.  We now have a metric to measure the overall efficiency of a team in terms of creating and restricting CCCs. A team with a CEDE Score of over 1.0 is more efficient than the average team and vice a versa.

As I said at the beginning, I would like to create some metrics that both explain what has happened in matches and that high prediction values. To test their explanatory value I will calculate their relationship with goal difference (GD) and to test their predictive value I will calculate the correlation between the year 1 values achieved by each team with their year 2 values.

Below is the graph of the CEDE Score to GD over the past 3 years. Whilst it would be good to have more, CCC data has only been available for the last 3 full seasons. We can see that there is a positive relationship, a team with the average CEDE Score of 1.0 would be expected to have the same GD as that of the average team, which is zero, whilst a team with a CEDE Score 5% above average at 1.05 would be expected to have a positive GD of about 17 goals. However there is some variance in the data and the R2 is equal to 0.35, which is not bad, but I would like to do better.

As for the repeatability of the CEDE Score, with only 2 seasons where we can measure a team’s performance against the previous season, and as we can only look at nonrelegated teams, we have even less data with only 34 observations. As you can see from the graph below, the relationship is somewhat weaker, and the correlation coefficient is 0.42.

To see how this compares to other footy ‘fancy stats’, then if you haven’t already, I highly recommend you have a look at James Grayson’s blog, who has done a huge amount of work on repeatability of various football stats. The reason for the low score could well be due to the relatively small number of CCCs created each season. The average team only takes and concedes 70 CCCs in a season, so small differences year on year can have a big impact.

One of the main issues with the CEDE Score is that, as an efficiency score, it is only a proportion of total shots and does not take into account how many shots a team take or faces. For example Stoke under Pulis were a team that had fewer shots attempted in their matches than average, they attempted to take their shots from as close to goal as possible, whilst reducing the opposition to shoot form distance shots. Over the last 3 season’s their CEDE Score has been above average, and in the 2012/13 season they actually had the 4th best CEDE Score, however this did not translate into league position.

If you are reading this then there is a good chance that you know about the Total Shot Ratio (TSR), but in case you don’t, TSR measures the proportion of shots that a team has taken. TSR is essentially the go to metric in football if you want to compare teams. It has been shown to be meaningful, repeatable, is simple to understand and calculate. Although you can find a lot more detailed info and data about TSR on James’ blog, I thought it would be useful to see the same graphs over the same time period as what I’m working with for ease of comparison. Over the 3 seasons, TSR to GD has an R2 of 0.66, and the year on year TSR correlation coefficient is 0.86.

One of the main issues with regards to TSR is that it treats all shots as equally and does not take into account chance quality. So, what we already have is a very good metric that does not take into account chance quality, and what I’ve introduced is an ok metric that focuses on the efficiency of a team to create good chances for and restrict good chances against. Lets see what we get when we combine the two…

Handily, as the CEDE Score has an average of 1.0, we can use it as an efficiency multiplier with TSR, I’ll call it CEDE’d TSR (TSRCEDE). The effect should be to normailse TSR so that there are fewer outliers. We can instantly see from the graph of TSRCEDE to GD that the observations are packed closer to the trend line than the TSR graph. The R2 for TSRCEDE is in fact 0.79, which I think is quite a good increase. 

To give some examples of the effect this has had, last season Tottenham were the top ranked TSR team last season with a score of 0.65, their CEDE’d TSR dropped down to 0.60 which ranked 4th. It has often been mentioned that Man Utd broke many a model last season, mainly because their TSR of 0.53 ranked them only 7th, however once CEDE’d it increased to 0.60 and slightly higher than Tottenham’s to rank 3rd. Ok, this still does not quite predict them having the stellar season that they did, but it is a big move in the right direction

In terms of the year on year correlation, the graph looks pretty similar to that of TSR, and its correlation coefficient is 0.87, which is slightly stronger that TSR’s year on year correlation from the same period.

So, how are things looking this season then? Hopefully the table below is pretty self explanatory, its sorted by TSRCEDE and also lists each team’s CE, DE, CEDE Score and TSR, ad shows the rank of the CEDE Score and TSR as well as TSRCEDE.

The top ranked team in terms of CEDE Score is table topping Arsenal, which corresponds to what Colin Trainor has found in the first of his mid-year reviews (here) and recent table toppers Liverpool in 2nd. However surprisingly the next 6 teams are made up of what we would consider weaker sides, West Brom in 3rd, followed by Stoke who have a good CEDE Score again, Sunderland, Hull, Cardiff and Fulham. The reason for this in the main I believe is that these teams, in effect, know their limitations and attempt to pack their defences to reduce the opposition to difficult shots, I’ll be interested to see if Colin’s reviews corroborate these numbers when he comes to looking at these teams. Man City only come in 14th by this measure and actually have a CEDE Score of less than 1.0 due to their poor DE.

I have to say that this is unusual, in general it is the better teams that have the better CEDE Score. What has happened is that 4 of the biggest shooting teams have seen their CE fall significantly. As these 4 teams, Man Utd, Man City, Chelsea and Everton, also changed their manager in the summer, we’ll have to wait and see if this is a short term effect of having a new manager bed in, or in fact due to different tactics employed by those managers. The overall effect is that the average CE for all teams this season has fallen to under 11%.

If we look at the TSRCEDE numbers compared to TSR, we see they are, as you would expect, very similar, however the changes are enough to shuffle the pack a bit. Some of the notable changes are Chelsea becoming the top ranked team from 3rd in terms of normal TSR, Arsenal go from a Utd like 8th ranked TSR team to 5th by TSRCEDE, and Everton do the opposite and go from 5th to 8th. At the bottom of the table we’ve got 5 teams grouped together on a TSRCEDE of 0.43, with Cardiff and then Fulham adrift at the bottom.

I’ll leave it there for now, but hopefully I’ll have a bit of time to be able to a few other metrics I’ve put together using CCCs.

Sources: 2013-14 season from, all earlier data from

No comments:

Post a Comment