In the first part of my Soccer Analytics series, I talked about general statistics of my dataset of the season 2013. As a reminder, i scraped all results from 177 domestic leagues and intra continental cups world wide. This entry will deal with the question "Who was the best team in 2013?" according to network analysis and why the result is (most likely) wrong.
Visualizing the Soccer Network
In the case of soccer, the set of nodes (or vertices) are the teams and there is a link between them, if the played against each other. We can put further information on the link, by giving it a direction, that is if A looses against B, the link goes from A to B or additionally by weights, for example the goal difference of the game.
For the visualization, we only look at the network "who played against whom?". Using visone
and my skills in drawing stuff, i created two different layouts of the network. One where i tried to to show all the team names without any overlap (good luck finding your team) and one without labels but nicer layout. To get the full viewing experience you better look at them in full resolution!
A Global Ranking via PageRank
In order to get some meaningful ranking of the clubs, we have to modify the network from the previous section a bit. Here, we do not just want to know who played against whom, but also who won against whom. So the first thing we do is to orient the links, such that they point from the looser to the winner. Why from the looser to the winner you ask? Well, we just assume that something flows, lets call it prestige, from the looser to the winner at the end of a game. For the moment, we suppose that "1 prestige" flows from the looser to the winner, and if a game ends in a draw 0.5 prestige flows in both directions.
As a ranking method, I chose Googles famous PageRank
. Its concept can be translated to soccer quite easily. A team that beats a lot of unimportant
teams should not be considered as good as a team that might have lost a few games, however which won against some important
teams. If you think these considerations are a bit far fetched, let me back this up with actual science. There is a scientific article [1
] which does exactly this with a network of tennis matches.
The following table shows the Top 25 Teams according to PageRank.
So Al Kuwait SC, the winner of the Kuwaiti first division, was the best team in 2013 and AS Saint Michel, the champions of Madagascar, the second best. You think that is unreasonable? Well it probably is (No offence!). The thing about the approach taken is that a lot of important factors were neglected.
I found a very recent article about using PageRank to rank national soccer teams [2
], backing up my pseudo science. To quote the article:
"Finally, our results indicate that the Random walk approach with the use of right metrics can indeed produce relevant rankings comparable to the FIFA official all-time ranking board."
Ok so apparently it is
actually possible to rank soccer teams with PageRank (implying that the FIFA ranking is relevant...). We just have to use the right metrics
(the weight of the links) to get a relevant ranking.
The authors describe ten different metrics which could be used. In the end it was just a simple fraction of games lost and games played that produced a ranking close to the FIFA one.
However, this will not hold for soccer clubs, since there are more factors that have to be taken into account. International matches should be weighted more, different leagues have different overall strengths, the goal differences should be taken into account, home/away wins and so on. Incorporating these factors would maybe give us a reasonable ranking. However, this would be too scientific for this blog. Or maybe I do that another time...
Yet, I found a way to produce a ranking with my simple prestige model that seems more reasonable.
I just changed the damping factor [3
] from $0.85$ to $0.95$ to reduce the factor of randomness a bit.
Seems more reasonable. Especially since Borussia Dortmund made it to the Top 25!
Labels: Network Analysis, Soccer Analytics