Which country is the best? Dealing with ranking data
Who does not dream about living in the best country of the globe? But what counts the best? There are various factors, along which countries can be compared. Countries are ranked on the basis of their GDP, the well-being of their citizens, the level of their inhabitants’ freedom and on many more. Our research aims to come up with a top list of the countries, which amalgamates more aspects of ranking. This goal is achieved with rank aggregation.
What is rank aggregation?
Ranking data is ubiquitous. We love university rankings and we tend to check sport league rankings. Before buying a gadget, many of us check various test sites. But what if there is more than one ranking available and they are not identical? Rank aggregation helps us out! In this post, we’ll examine the very basics of rank aggregation. Instead of gadgets, we will compare various country rankings. But we keep in mind that rank aggregation is used in many fields, from voting theory to economics, where it is often called preference ordering. Additionally, some years ago, it was used in meta-search to aggregate search results from various search engines.
Our initial aim was to find out which is the best country in the world. However, we shortly realized that there exists not a single top list of the countries, but there are several ones depending on the thing they are the best at. Hence, we decided to take more rankings into consideration to answer the question. We chose the following country rankings and indices:
First we converted each table into a ranking order, then we merged the eight tables listed above with the the coco country converter Python package. As a result, we got a master table with 8 columns and 106 rows. The columns represent the rankings of the countries from different aspects, while the rows consist the countries. In short, now we have a master table in which each and every country has got a value between 1 and 106 in every column. Below, you can have a look at the data along with our aggregations and the result of the clustering.
And the winner is …
Let’s have a look at the aggregated ranking. The first ten countries are mainly European (Sweden, Germany, Norway, Finland, ranked first, second, third and fourth, and the UK, ranked tenth), there are three commonwealth countries (Singapore, Australia and New-Zealand, ranked seventh, eighth, and ninth) and Japan (ranked sixth). All countries has got consistently high scores at each ranking, except for Singapore, which is ranked 73rd on freedom. Although the United States gets good scores on all the eight rankings, its aggregated rank is 74, so it comes right after Kyrgyzstan. It can be explained with the fact that, as we will see later, no aggregated ranking is perfect.
Let’s explore the data
It’s so good that we managed to convert eight rankings into one, but it’s not self-explanatory what we have now in a master table. Let’s analyze the data and try to find some pattern in this mass. The three-dimensional barplot below is our first attempt to interpret the data.
Although the visualization above coud have been done in a way that it would be more pleasing to the eye, but it would not be more comprehensible. The problem with it is that it doesn’t help us reveal any pattern in the data.
What if we treat each country as a vector of eight integer value, so using dimensionality reduction? As a second attempt, let’s give it a try and let’s make a self-organizing map. Here comes the more compact and interactive viz.
One with expert eyes can identify some clusters of the countries, but one must admit that this type of visualization barely makes sense for lay people. But don’t give up! Let’s translate the self-organizing map into a world map. The countries are colored according to their clusters, hence all the ranking data is made available on the viz. Thanks to the fantastic QGIS and the qgis2web plug, it’s a matter of a few clicks to make an interactive visualization. Though it would hardly win a prize for its aesthetics, it is usable and informative.
It becomes obvious at first glance that the clusters make sense and the map is understandable for everyone. One can easily identify that the Western-European countries along with Canada and Australia form a cluster. What also calls one’s attention is that the United States, South-Korea and Estonia form a separate cluster from the one mentioned previously. The same happens with New-Zealand and Portugal too. Finally, note that a significant portion of the countries are colored with blue and there is no data provided. It’s because there were only 106 countries out of the approximately 200 listed in each and every rankings we used.
Why these rankings?
We chose the above-mentioned rankings partly because of personal interest and partly because self-organizing maps are very often used for exploratory data analysis in case of development and macroeconomic datasets.
As for personal motivation, we were wondering what makes a country decent after reading the following theories. According to Niall Ferguson’s Civilization, competition, science, property, medicine, consumer society, and capitalist work ethic made the West the ruler of the world for centuries, and possibly China is becoming a superpower as its society is adopting to them.
Robinson and Acemoglu, as they outline in their book, titled Why Nations Fail, think institutions matter a lot. Democratic institutions, property rights, and the rule of law make such an exclusive environment in which citizens, their well-being, and the country’s economy can thrive.
Jared Diamond occupies a completely different position and he enumerates geographical and environmental constraints which helped Western civilization to spread across the whole world. According to this view, articulated in his seminal book, Guns, Germs, and Steel, western civilization is not so special, it was just lucky enough to be born in an environment which provided it with the necessary resources to occupy most of our planet.
The Nobel-laureate economist and philosopher, Amartya Sen offers a different perspective in his Freedom as Development, the fundamental work of the so-called capabilities approach. In his work, Sen outlines five different types of freedom, namely political freedom, economic facilities, social opportunities, the guarantee of transparency and protective security. Development should be viewed as an advancement of these freedoms. Mahbub ul Haq and Sen developed the Human Development Index, which tries to capture to what extent people can exercise these freedoms.
So, given our rankings, can we aggregate them to see which are the best performing countries, and can we spot any pattern which helps us understanding why certain countries are better off than others?
Rank aggregation – now in detail
The most simple way for rank aggregation is using the mean of each ranking as an aggregate score for each country. This is simple and straightforward, but we all know from Statistics 101 that the mean is very misleading because it is sensitive to outliers. Take China for example, its average ranking is 50.3, but it ranked 106th and 71st for freedom and rule of law, and it is the 2nd in case of science. So let’s try to find a better method for rank aggregation.
First, we need a measure of similarity between rankings. Kendall’s tau is a great way to measure the correlation between two rankings and Scipy implements it as part of the stats module. So let’s see what it tells about the similarity of various lists.
The ideal ordering would be the one that maximizes the correlation to every ranking. In theory, first, we should generate all the possible orderings, aka the permutations. Then, we should measure the correlation between the permutations and the actual rankins. There should be one with the highest average correlation. In practice, we cannot do that. In our case we should generate all the permutations of the 106 countries, which results in
rankings. Comparing each permutation to our original eight rankings would run for ages even on multicore CPUs with very high instruction per second. The problem of rank aggregation is pretty hard. If we have four or more rankings, it becomes NP-hard. Arrow’s Impossibility Theorem complicates the situation, since it states that we cannot aggregate rankings in a way that satisfies, or at least captures the relative orderings of every aggregated rankings. In a nutshell, take rank aggregations with a grain of salt, which cannot be perfect. If you’d like to know more about the topic, the linked wikipedia pages are good starting points to explore the field.
For our final aggregation, we used the RankAggreg R package. It offers two ways to generate the possible aggregation. One uses evolutionary algorithms, while the other does Cross-Entropy Monte Carlo, CE for short. Both approaches are very powerful in case of combinatorial problems. At the end of the day we inserted the results of a rank aggregation using CE under the Aggregated column of our data table.
It seems that rank aggregation captures something, but it is not perfect. So we tried to find a way to see the relative position of the countries to each other. Since each country is represented by eight numbers, we needed a dimensionality reduction technique. We chose Self-organizing maps abbreviated as SOM, a type of artificial neural network, which is an unsupervised method used to map high dimensional data into low-dimensional space. We used the SimpSOM Python package to train our SOM and to run clustering on the data.
Our clusters seem to be more or less coherent. Typical well-fare states (Scandinavian and Benelux countries, along with Western-Europe and Australia and Canada) consist of one cluster (namely cluster 0), and they are very good at all the eight rankings, so their aggregated and average rankings are good too. In the case of cluster 5, the members’ average ranking is similar and they are scored relatively good at all measures except for freedom, which is not surprising, since two of them are under communist rule (China and Vietnam), another two are post-soveit states with their very distinct interpretation of democracy (Russia and Kazakhstan), and we can find two kingdoms (the United Arab Emirates, and Morocco) with very limited parliaments.
Standalone data visualizations
You can find all the embedded data visualization on our github page, or just click on the links below.