Opta have just announced that they are now taking submissions for The OptaPro Analytics Forum 2015 (more details here).
Earlier this year I was lucky enough to have been picked to present at their first conference, so I thought it would be a good time to post the slides.
If anyone out there is thinking of submitting an idea I can highly recommend it (although as someone who doesn't really do that type of thing, it was bloody stressful!). I wasn't going to bother last year as I thought it would never get picked, but decided on the deadline for submissions that I had nothing to lose by giving it a go and I would never get the data to do something like that otherwise.
So, if you've got an idea, get it in!
Friday, 7 November 2014
Friday, 3 January 2014
Creating some new metrics using Opta's Clear Cut Chance stat
Having read
this from Richard Whitall (here), I thought I’d see if I could come up with any metrics
using Opta’s Clear Cut Chance (CCC) that both explains what has been happening
in season and that might also be used to predict what may happen going
forwards. This would build on the work I have done already looking at how
efficient teams are at attacking (here) and defending (here).
I’ll start by
adding a team’s Creative Efficiency (CE) and it’s Defensive Efficiency (DE)
together, which I’ll call the CEDE Score. As a recap, a team’s CE is calculated
by dividing a team’s CCCs for by their total shots, the higher the number the
more efficient they are at creating CCCs when they do take a shot. Conversely a
team’s DE is calculated by dividing their Normal Chances conceded (ie non CCCs)
by their total shots conceded.
The average
CE over the previous 3 seasons was 13%, whilst the average DE was 87%, add the
two together (in their decimal form) and you get CEDE Score of 1.0 for the
average team in the league. We now have
a metric to measure the overall efficiency of a team in terms of creating and
restricting CCCs. A team with a CEDE Score of over 1.0 is more efficient than
the average team and vice a versa.
As I said at
the beginning, I would like to create some metrics that both explain what has
happened in matches and that high prediction values. To test their explanatory
value I will calculate their relationship with goal difference (GD) and to test
their predictive value I will calculate the correlation between the year 1
values achieved by each team with their year 2 values.
Below is the
graph of the CEDE Score to GD over the past 3 years. Whilst it would be good to
have more, CCC data has only been available for the last 3 full seasons. We can
see that there is a positive relationship, a team with the average CEDE Score
of 1.0 would be expected to have the same GD as that of the average team, which
is zero, whilst a team with a CEDE Score 5% above average at 1.05 would be expected to
have a positive GD of about 17 goals. However there is some variance in the
data and the R2 is equal to 0.35, which is not bad, but I would like
to do better.
As for the
repeatability of the CEDE Score, with only 2 seasons where we can measure a
team’s performance against the previous season, and as we can only look at
nonrelegated teams, we have even less data with only 34 observations. As you
can see from the graph below, the relationship is somewhat weaker, and the correlation coefficient is 0.42.
To see how
this compares to other footy ‘fancy stats’, then if you haven’t already, I
highly recommend you have a look at James Grayson’s blog, who has done a huge
amount of work on repeatability of various football stats. The reason for the low score could well be due to the
relatively small number of CCCs created each season. The average team only
takes and concedes 70 CCCs in a season, so small differences year on year can
have a big impact.
One of the
main issues with the CEDE Score is that, as an efficiency score, it is only a
proportion of total shots and does not take into account how many shots a team
take or faces. For example Stoke under Pulis were a team that had fewer shots
attempted in their matches than average, they attempted to take their shots
from as close to goal as possible, whilst reducing the opposition to shoot form
distance shots. Over the last 3 season’s their CEDE Score has been above
average, and in the 2012/13 season they actually had the 4th best
CEDE Score, however this did not translate into league position.
If you are
reading this then there is a good chance that you know about the Total Shot
Ratio (TSR), but in case you don’t, TSR measures the proportion of shots that a
team has taken. TSR is essentially the go to metric in football if you want to
compare teams. It has been shown to be meaningful, repeatable, is simple to understand and
calculate. Although you can find a lot more detailed info and data
about TSR on James’ blog, I thought it would be useful to see the same graphs
over the same time period as what I’m working with for ease of comparison. Over
the 3 seasons, TSR to GD has an R2 of 0.66, and the year on year TSR correlation coefficient is 0.86.
One of the
main issues with regards to TSR is that it treats all shots as equally and does
not take into account chance quality. So, what we already have is a very good
metric that does not take into account chance quality, and what I’ve introduced
is an ok metric that focuses on the efficiency of a team to create good chances
for and restrict good chances against. Lets see what we get when we combine the
two…
Handily, as
the CEDE Score has an average of 1.0, we can use it as an efficiency multiplier
with TSR, I’ll call it CEDE’d TSR (TSRCEDE). The effect should be to
normailse TSR so that there are fewer outliers. We can instantly see from the
graph of TSRCEDE to GD that the observations are packed closer to
the trend line than the TSR graph. The R2 for TSRCEDE is
in fact 0.79, which I think is quite a good increase.
To give some examples of the effect this has had, last season Tottenham were the top ranked TSR team last season with a score of 0.65, their CEDE’d TSR dropped down to 0.60 which ranked 4th. It has often been mentioned that Man Utd broke many a model last season, mainly because their TSR of 0.53 ranked them only 7th, however once CEDE’d it increased to 0.60 and slightly higher than Tottenham’s to rank 3rd. Ok, this still does not quite predict them having the stellar season that they did, but it is a big move in the right direction
To give some examples of the effect this has had, last season Tottenham were the top ranked TSR team last season with a score of 0.65, their CEDE’d TSR dropped down to 0.60 which ranked 4th. It has often been mentioned that Man Utd broke many a model last season, mainly because their TSR of 0.53 ranked them only 7th, however once CEDE’d it increased to 0.60 and slightly higher than Tottenham’s to rank 3rd. Ok, this still does not quite predict them having the stellar season that they did, but it is a big move in the right direction
In terms of
the year on year correlation, the graph looks pretty similar to that of TSR,
and its correlation coefficient is 0.87, which is slightly stronger that TSR’s
year on year correlation from the same period.
So, how are
things looking this season then? Hopefully the table below is pretty self
explanatory, its sorted by TSRCEDE and also lists each team’s CE, DE, CEDE
Score and TSR, ad shows the rank of the CEDE Score and TSR as well as TSRCEDE.
The top
ranked team in terms of CEDE Score is table topping Arsenal, which corresponds
to what Colin Trainor has found in the first of his mid-year reviews (here) and recent table toppers Liverpool in 2nd.
However surprisingly the next 6 teams are made up of what we would consider
weaker sides, West Brom in 3rd, followed by Stoke who have a good CEDE Score again, Sunderland, Hull, Cardiff and
Fulham. The reason for this in the main I believe is that these teams, in
effect, know their limitations and attempt to pack their defences to reduce the
opposition to difficult shots, I’ll be interested to see if Colin’s reviews corroborate
these numbers when he comes to looking at these teams. Man City only come in 14th
by this measure and actually have a CEDE Score of less than 1.0 due to their poor
DE.
I have to say
that this is unusual, in general it is the better teams that have the better
CEDE Score. What has happened is that 4 of the biggest
shooting teams have seen their CE fall significantly. As these
4 teams, Man Utd, Man City, Chelsea and Everton, also changed their manager in
the summer, we’ll have to wait and see if this is a short term effect of having
a new manager bed in, or in fact due to different tactics employed by those
managers. The overall effect is that the average CE for all teams this season has fallen to under 11%.
If we look at
the TSRCEDE numbers compared to TSR, we see they are, as you would
expect, very similar, however the changes are enough to shuffle the pack a bit.
Some of the notable changes are Chelsea becoming the top ranked team from 3rd
in terms of normal TSR, Arsenal go from a Utd like 8th ranked TSR
team to 5th by TSRCEDE, and Everton do the opposite and
go from 5th to 8th. At the bottom of the table we’ve got
5 teams grouped together on a TSRCEDE of 0.43, with Cardiff and then
Fulham adrift at the bottom.
Sources: 2013-14 season from http://www.fantasyfootballscout.co.uk/, all earlier data from http://eplindex.com/
Subscribe to:
Posts (Atom)