Rediff.com« Back to articlePrint this article

Why we pick Brazil & France for the final

Last updated on: July 06, 2018 15:42 IST

Infinite Analytics CEO Akash Bhatia and Lead Scientist Joseph Kibe reveal how AI reads Wikipedia index noise to pull out the winners.

Kylian Mbappe, who is just 19, scored twice in France's come-from-behind 4-3 victory over Argentina, sending Les Bleus into the quarter-finals. Photograph: Pilar Olivares/Reuters

IMAGE: Kylian Mbappe, who is just 19, scored twice in France's come-from-behind 4-3 victory over Argentina, sending Les Bleus into the quarter-finals. Photograph: Pilar Olivares/Reuters

With the World Cup well underway, people are finally getting a chance to check the quality of their predictions.

There's a certain amount of hubris in trying to predict the tournament winner.

In football, scoring a point is very difficult relative to other sports, which makes it hard for a single game to consistently figure out which team is better.

Even so, what can we say about individual teams that is perhaps a bit more interesting than comparing how good they are at goal scoring or winning games?

One take, from authors and football analysis gurus Chris Anderson and David Sally in their book The Numbers Game: Why Everything You Know About Soccer is Wrong, contends that football is a 'weak linked' sport -- a team's ability to score or block a point depends a lot on a team effort.

According to this theory, the quality of a team's worst players matters more than how great their star players are: Lionel Messi's prowess cannot overcome the effects of him playing on a team of other relatively weak players.

How true is this, though?

We can begin to pick apart this question by first figuring out how to compare all the players from the different World Cup national teams. It is a little tricky to figure out how good the worst players on a given team are.

Lesser players tend to play on less well-known teams that are less covered.

Fortunately, in the context of the World Cup this problem is somewhat tractable.

For starters, all of the players from the different national teams have a Wikipedia entry.

As a rough proxy for player quality, it turns out that the length of a player's Wikipedia article tends to correlate pretty well with how good that player is.

It's not perfect, of course. The signal gets much noisier as the players get worse.

For our purposes, however, it's a good enough proxy.

Ranking players by Wikipedia article length, for example, we see that the best players do indeed bubble to the top, which is a good gut check.

 
PlayerWikipedia Index(indexed to Ronaldo)
Cristiano Ronaldo (Portugal) 100
Lionel Messi (Argentina) 100
Luis Suarez (Uruguay) 61
Neymar (Brazil) 58
Eden Hazard (Belgium) 51
Thiago Silva (Brazil) 49
Sergio Aguero (Argentina) 46

While some may quibble about using a Wikipedia article index, we feel this is fine as a tool for exploring this weak link hypothesis to the extent the scores correspond well to player quality on average.

For example, Philippe Coutinho, who earns a hefty 4 million euros annually playing for Barcelona most of the year, ranks 45th by our Wikipedia index.

Not at the very top, but roughly corresponding to his still high level of skill.

With that mechanism, then, we can begin to explore this weak link hypothesis.

What would happen if we compare the top five teams both on the basis of their three best players, compared to their three worst?

In other words, if we rank by how much of a Ronaldo or Messi the average of the top and bottom three players on a given national team represent.

TeamWikipedia Index(Mean of the three best players)
Argentina 58
Portugal 47
Brazil 43
Uruguay 37
Germany 36

 

Ranking by the Three Weakest players
TeamWikipedia Index (Mean of the three worst players)
England 11
Spain 8
Germany 8
Brazil 7.5
Portugal 7.5

Perhaps unsurprisingly, the better teams populate the top of the rankings in both lists.

The consistently good German and Brazilian teams make it in to the top five ranking both on the best and worst players using our standardised Wikipedia index.

But there are also some surprises.

The Argentinian team that boasts the excellent Lionel Messi drops from 1st to 13th when we look at the quality of the worst players.

We can get an early 'sneak peek' at how some of these dynamics might be playing out by looking both at early results from the competition so far.

The game between Portugal and Spain played on June 15 offered one early opportunity in the competition to test this framework.

According to the weak link hypothesis, the teams ought to be relatively well-matched.

The two teams' three worst players average about the same Wikipedia index of around eight points.

Whereas, the Portuguese team considerably outranks the Spanish looking at the best players -- Portugal's 47 compared to Spain's 32.

In the end, the Portuguese managed to tie the score in the last minutes of the match, thanks to a last-minute goal from Cristiano Ronaldo.

It does suggest the large advantage of Portugal's superior roster of star players was at not enough to have them totally dominate the Spanish team, which should have ideally been the case.

Probably one the most anticipated upcoming matches is Brazil-Belgium in the quarter finals on Friday.

Interestingly enough both teams placed in the top 10 on our Wikipedia Index List for worst ranked player, with Brazil and Belgium placing 4th and 7th respectively.

With Brazil slightly ahead on our index at 7.52 and Belgium with 7.15.

The average of the best players, according to our index doesn't tell a vastly different story with Brazil still on top.

While we have to give the edge to Brazil according to our analysis, it would be premature to count Belgium out.

Brazil may be the favourite but only by a hair's breadth.

For us the France-Uruguay quarter final will be extremely exciting to watch.

When we look at the the average of the 3 best players, Uruguay clearly beats out France with a Wikipedia Index score of 36.96 to 31.61.

On the flip side we see that when we look at the averages for the worst ranked players on our Wikipedia Index, then France has the edge on Uruguay with a score of 6.56 to 5.05.

This matchup goes to the very core of your outlook on football, it poses an almost philosophical premise of 'are matches decided by the best players on the team or are you only as good as your weakest link?'

We are true believers that football is a team sport. As such we go with the team that has the best of the worst.

Vive la France!

Only time will tell whether the 2018 World Cup further validates or rejects this weak link hypothesis.

Even so, it provides a more interesting almost sociological lens to watch the tournament's final stages.

Appendix A: All Teams, Mean of the 3 Worst Players’ Wikipedia Indices

All Teams, Mean of the 3 Worst Players' Wikipedia Indices
England 10.86061861
Spain 7.997310852
Germany 7.962876646
Brazil 7.523860674
Portugal 7.508673065
Netherlands 7.162879442
Belgium 7.155433481
France 6.56515968
Mexico 6.52465043
Japan 6.225279815
Serbia 6.166061582
Russia 5.949725521
Argentina 5.714465431
Sweden 5.607076942
Croatia 5.412460505
Switzerland 5.402890968
Iran 5.349935943
Costa Rica 5.302034494
Poland 5.165561063
Denmark 5.159459138
Senegal 5.123358327
Uruguay 5.048656797
Nigeria 5.021964239
Colombia 5.018442864
South Korea 4.992664251
Iceland 4.987718198
Morocco 4.73022775
Panama 4.590447989
Saudi Arabia 4.482844454
Egypt 4.46448492
Peru 4.435158051
Tunisia 4.103933779

 

Appendix B: All Teams, Mean of the 3 Best Players' Wikipedia Indices

All Teams, Mean of the 3 Best Players' Wikipedia Indices
Argentina 58.05064221
Portugal 46.56829935
Brazil 43.59754934
Uruguay 36.96405886
Germany 36.23712346
Belgium 35.79109162
Croatia 31.64710556
France 31.61081659
Spain 31.54756625
Colombia 31.50883112
Mexico 27.99205298
England 27.68945668
Netherlands 21.88711817
Poland 21.19354174
Egypt 20.90742333
South Korea 20.56318878
Denmark 20.31631622
Japan 19.65601814
Serbia 19.44016593
Sweden 19.03579921
Nigeria 18.69151091
Switzerland 18.0436586
Iceland 17.25444056
Morocco 17.00358965
Iran 16.85991219
Russia 16.82942945
Costa Rica 16.30520158
Senegal 15.67810119
Peru 14.94621974
Panama 14.10246072
Tunisia 8.935582497
Saudi Arabia 8.663522697
Akash Bhatia and Joseph Kibe