Solving Liverpool's goalscoring woes | In-depth Stats Analysis

Solving Liverpool's goalscoring woes | In-depth Stats Analysis

First published on ‘The Tomkins Times‘ in April this year. Recently free for non-subscribers. Edited by Paul Tomkins. With the transfer window about to swing open, they thought it was worth bringing back to light. 

This season, for me at least, will be remembered as one where Liverpool missed opportunities, learned valuable lessons and posed a number of questions that’ll need answering (ideally sooner rather than later), such as:

  • Why have we squandered so many clear-cut chances to score?
  • Do the things that ‘matter’ in football translate into measurable data points?
  • Is the Liverpool model for valuing players fundamentally flawed?
  • Who could we really buy this summer to help solve our goal-scoring problems?
  • Can this management team deliver a side that will challenge for the title?

Given the spiralling negativity of recent weeks, I wanted to make a positive contribution. So I opted to tackle the one question that lends itself to objective analysis using publicly available data:

  • Who could we really buy this summer to help solve our goal-scoring problems?

All of the data visualisations provided in this article are interactive. Feel free to click on them to highlight, filter and manipulate the results or view the underlying data (a larger screen is better, apologies to those using mobile devices or netbooks. Also, the visualisations aren’t supported by the IE6 browser).

(The best goalscorer in the world. Picture is to highlight that, not to suggest he's Liverpool's solution...)

Getting the basic data together

“It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” Sir Arthur Conan Doyle, The Adventures of Sherlock Holmes

I decided the most logical place to start would be to understand the best-performing goal-scorers in Europe at the moment.

This list was sourced from Euro Rivals as of 26th March 2012. With this base, I compiled my analysis dataset manually using Google and Wikipedia as of 6th April (I need to thank my beloved for helping with this…3,360 ‘cells’ of data gathered and checked by hand is a lot). The base data includes a count of the league games played and the league goals scored by each player in the various countries in which they’ve played throughout their careers (for those who like the detail, an annex at the end of the article walks through the structure of the data file and the formulae used).

Setting a scope in this way obviously has its issues. For example, players who were terrific up until this season might be wrongfully excluded. Similarly, players that have been a ‘flash in the pan’ this year may be wrongfully included. Given that our focus is on Liverpool, I decided to add Luis Suarez, Andy Carroll and Fernando Torres – whose sale was instrumental in the signing of the former Newcastle no.9 – into the data for comparison purposes. Though not comprehensive, I think the list is an acceptable guide given that the objective is to find goal-scorers (after all, Peter Crouch makes the list with just eight goals).

Another factor to consider is the comparative value of goals scored in the Premier League with those scored in other leagues around the world. I’ve not completely cracked this (but I’ve got as close as I can with the data available to me):

  • Goals are weighted based on the current rating of the top league in a given country;
  • Players who have scored a majority of their goals in a country’s lower divisions will be over-rated;
  • Goal weightings are derived using the Uefa coefficient as a primary and Fifa rankings for non-Europeans.

All goals are equal, but some goals are more equal than others

Feedback from Beez (@BassTunedToRed) on an earlier version prompted me to go a little deeper and establish a model for valuing the goals scored in different countries. By weighting the goals scored, it should be possible to derive which players are genuine Premier League prospects and which aren’t.

Let’s take Luis Suarez as an example:

[table id=73 /]

The visualisation below shows the complete weightings scale for all countries where the listed players have scored goals. For European countries, the weighting is derived from the Uefa coefficient value. For countries outside of Europe, the weighting is derived using the Fifa rankings for that country in relation to the leading country (Spain).

Click Link: Converting overseas goals into a Premier League currency

Understanding the basics

Now that we’re comparing ‘like with like’ in terms of goals, we can start to build some theories from the facts on offer:

  • What’s the prime age range for a top European goal-scorer?
  • How many league games is a top European goal-scorer capable of enduring?
  • How many goals can you expect from a top European goal-scorer?

The visualisation below shows that experience matters when it comes to scoring goals. Almost 40% of the top goal-scorers in Europe so far this year are between the ages of 26 and 29. It would appear that Andy Carroll and Luis Suarez, aged 23 and 25 respectively, are some way short of their ‘peak’ years.

Click Link: Basic Goalscorer Analysis

The visualisation also gives an indicator, irrespective of age, of how many games a top European goal-scorer can actually manage in his career. This will prove to be very useful later in the analysis when it comes to predicting how many Premier League goals a player is likely to score before they ultimately need to be sold. Being generous, the data suggests a working capacity of 400 league games.

How many goals can you expect from a top European goalscorer?

Guided by our data-driven assumption that a top European goal-scorer has 400 top-level league games in them, is it possible to estimate how many Premier League goals they have in them as well? To explore the idea, I used the visualisation below to derive the following formula:

Premier League Goals = 0.000187321*League Games^2 + 0.24872*League Games + -0.180377

Click here: Relationship between goals scored and games played?

Now, I’m not saying that this model is the perfect tool for evaluating top European goal-scorers – not by any stretch of the imagination. I’m also not saying that the elite clubs in Europe can comfortably base their recruitment strategies on basic data scraped together using Wikipedia and Google. But, perhaps a simple model like this has its place when it comes to evaluating markets that are less ‘data rich’ (and therefore poorly charted), or can act as a useful guide early on in the talent search?

Which players really stand out?

The good news, for me at least, is that the most stand-out desirable goal-scorer in this analysis is the one that everyone would think of naturally: Lionel Messi. Messi sits out there alone in the top-right quadrant of this visualisation. Not only is he hugely effective as a goal-scorer, but he’s also played a respectable number of games for a player of his age. Cristiano Ronaldo, by contrast has played over 100 more league games (though he is three years older). In theory at least, Messi has two more years before his peak begins!

Click here: Europe’s top scorers in 2011/12 (up to point sample was taken)

By focusing in on the top-right quadrant of this visualisation, it is possible to see more clearly who the ‘elite’ performers really are:

Click here: Top Quartile of Goalscorers in Europe 2011/12

Great record, but is Huntelaar too old?

If you were Damien Comolli’s replacement, who would you be buying and why?

As Liverpool fans we would like to think, even given our recent on-field issues, that we can attract top European talent (though maybe not the very top European talent right now). But which players would be on such a short-list at the moment?

Using this first visualisation I’ve highlighted a set of players that:

  • Have an above-average ratio of theoretical Premier League goals per game;
  • Have played more than 100 league games in their career so far; and
  • Are under 26 (allowing time for depreciation and sale at a high value)

Click here: Potential Targets?

What’s interesting here, particularly when you sort the various columns in the table, is how few goals Andy Carroll has actually scored compared to his peers across Europe (looking on the bright side, at least he’s on the chart!).

[Paul Tomkins Edit: Carroll only really started coming good for Liverpool after this data was gathered, although it is a look at full career record. My piece from late last year shows how target men tend to peak later as goalscorers.]

Perhaps it’s more appropriate to look at the youth prospects around Europe rather than the more established elite performers? The visualisation below simply illustrates players aged 23 or under:

Click here: Potential Targets Under 23?

What was Liverpool’s strategy in summer 2011?

If Liverpool’s strategy really was to burst back into the top four this year, where did Comolli/Dalglish think the goals were going to come from? There is little data to suggest that Andy Carroll was going to be an effective Premier League goal-scorer any time soon (that’s not to say he won’t be in a few years time), beyond the five months in the lead-up to the purchase.

Lacina Traore – who, unbelievably, towers over even Peter Crouch – offers a similar profile to Carroll in terms of the theoretical number of goals they both have left to score, goals to games ratio, goals scored and league games played. The only difference between them as goal-scoring prospects, based on this analysis, is the transfer fees that their respective clubs were prepared to pay for them. Traore was bought by Kuban Krasnodar in the Russian Premier League for approximately £5m in February 2011.

From the data, it looks to me like the management team expected Luis Suarez to be grabbing the majority of our goals this season (195 league games to play, 83 Premier League goals to score and a strike rate of 0.5 goals per game makes him, in theory, a top European goal-scorer). Perhaps the departure of Fernando Torres really did catch the new management team off guard?

Strangely, the Carroll deal makes less and less sense to me the more I think about it. If we needed a ‘quick-fix’ solution until our low-cost youth strategy started yielding a return, we could have looked at players like Mario Gomez, Papiss Cisse or Lukas Podolski? Wait a second … for £35m we could have had a good go at buying at least two of them! Sorry Andy…I know it’s not your fault.

Personally, I’d feel much more positive if were we cycling through cheap unknown players (filtering out the rough diamonds) rather than going in big. It’s seemingly not good for the player, the fans or the club to recruit in this way (unless the club is filled with players of a similar value and/or has cash to burn).

Conclusions

I genuinely feel like this work I’ve produced is a job half done. I’d ideally have liked to come up with an indicative market value for these players as a way of giving context to what we have in our squad today and what’s available on the market. Hopefully the Lacina Traore example gave a taste of what could be possible here. The best I can do is offer value ‘bandings’ for the players listed:

  • High potential, high ability
  • High potential, unproven ability
  • Medium potential, unproven ability
  • Medium potential, high ability
  • Low potential, high ability
  • Low potential, low ability

Click here: Full Table of Strikers

[EDIT: I’ve had a bit of time since originally producing this work, so I’ve investigated ‘value for money’ in more detail. Check this link in response to this recent piece by Beez. NB: Comments can only be viewed by Subscribers of TTT.]

If you take the ‘Weighted Goals Remaining’ number for any player in this analysis, you can multiply it to get an indicative guide price. The multiplier values are £100k, £200k,£300k and £400k:

  • Under £100k – excellent value for money
  • £100k to £200k – good value for money
  • £200k to £300k – questionable value for money
  • £300k to £400k – poor value for money, territory of the mega-rich
  • Over £400k – extreme spending

Arsenal, Newcastle and Everton have all managed to buy a predictable return of goals for a fraction of the money spent by Liverpool. In addition, these clubs are also managing to deploy those resources more effectively on the field. In effect, they’re beating us twice.]

The price of a player is ultimately dictated by supply and demand. Big clubs with big pockets will pay big prices. Charlie Adam’s journey to Liverpool is a good example. In the space of a couple of years the player went from being worth £500k to over £7m. Buying players and getting value for money, when you’re a big club like Liverpool, has to be hard. Perhaps even more so for us given that our history and status are at odds with our current financial fire-power.

However, I need to fall back on one of my favourite quotes from Michael Lewis (author of ‘Moneyball’) to wrap things up:

“There can’t be any corporation in America, any industry, where the employees are as scrutinised as professional baseball players. They do what they do in front of millions of people, many of whom just assume they are experts in valuing baseball players. They have statistics, sometimes the wrong statistics, but nevertheless statistics, attached to every move they make on the baseball field. They’ve been doing basically the same thing for 100 years and more or less the same thing for 150 years. If that employee can be so badly mis-valued that you can build a juggernaut out of the rejects of the profession, the defective parts, then who can’t be mis-valued?”

The management team of Rafa Benitez was data driven. Their track-record in the Premier League era is second only to Arsene Wenger at Arsenal (thoroughly documented in Paul, Gary and Graeme’s book ‘Pay As You Play). From what I understand as an outsider, that management team was able to achieve the results it did whilst being repeatedly undermined, let down or just ignored by a commercial leadership team that was hell-bent on steering the club into oblivion.

Now, and I appreciate only a year has passed, it would appear that the structure is in place for a similar data driven approach to surpass anything that came before it. If what Michael Lewis says about Moneyball is true, and value really is everywhere if you’re smart enough to see it, then surely our club should have been ideally placed to take advantage? It seems that, based on results, that hasn’t happened, and Comolli lost his job; ultimately, too much money was paid for too little return.

Annex
The structure of the data file and the formulae used:
Player ID: a unique identifying number for each player
Player: the player’s full name
Liverpool Link: has the player ever been registered with Liverpool FC
League Games: count of league games played
League Goals: count of league goals scored
Weighted League Goals: total of league goals multiplied by weighting factor
Date of Birth: the player’s date of birth
Age: player’s age derived from date of birth and data date
Age Range: bandings 3 years wide starting at ‘18 to 21’ up to ’over 33’
Depreciation years: arbitrary selection, 28 – player’s age
Height: height in metres
Position: i.e. Striker, Forward, Attacker
League Games/Age: League games divided by player’s age
Games/Age score: cumulative normal distribution of ‘League Games/Age’
Weighted League Goals/Games: ‘Weighted League Goals’ divided by ‘League Games’
Weighted League Goals/Games Group: bandings in intervals of 0.1 from 0.7 to 0.0
Weighted Goals/Games Score: cumulative normal distribution of ‘Weighted League Goals/Games’
Games Remaining: 400 – League Games
Weighted Goals Remaining:
Derive two values using the function defined earlier – theoretical ‘Weighted League Goals’ to score based on a shelf-life of 400 games, and theoretical ‘Weighted League Goals’ scored based on the number of games played. Subtract ‘b’ from ‘a’ to create ‘x’. Create ‘y’ by multiplying ‘Games Remaining’ by ‘Weighted League Goals/Games’. Create an average of ‘x’ and ‘y’ and replace any negative values with zero.
Defining the weightings for different European leagues:
Extract all European coefficient data. Use the cumulative normal distribution to derive a value for each coefficient value ranging from 0 to 1.
Defining the weightings for non-European leagues:
Divide the Fifa rankings score for any given country into the Fifa rankings score for the world-leading nation (Spain).