< Research in programming Wikidata

This article explores the object of the Wikidata "human settlement" and its properties. The following problems were solved in the paper with the help of SPARQL requests: finding instances of the object "human_settlement", building an ordered list of countries by the total population, living in the "human_settlement" and a list of objects that accompany the "human_settlement" in the "instance" property. Also a graph was constructed, which show the proportion of the population living in "human settlement". The diagram shows that a high percentage of the population living in "human settlement", accounts for less industrial countries, while a small percentage of the population living in "human settlement" have industrialized countries. In addition, an analysis of the completeness of the Wikidata on the basis of solved tasks is performed. The property "instance_of" was added to several objects to improve the results.

Instances of the object "human settlement"

Let's build a list of all the human settlements.

# 20.10.2017
SELECT ?hum ?humLabel 
WHERE 
{
  ?hum wdt:P31 wd:Q486972. # instances of human settlement
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SPARQL-query, 411393 results.

The most complete and detailed human settlements on Wikidata are: Antakya, General Roca, Padre Las Casas.

Almost empty and less informative human settlements were: Belomorsk, Segezha, Yanishpole.

Duplicate objects were found and merged: Belomorsk with Belomorsk, Segezha with Segezha, Yanishpole with Yanishpole.

List of countries by total population

Let us construct an ordered list of countries by the total number of people living in "human settlements".

# 26.10.2017
SELECT ?countryLabel (SUM(?population) as ?sumPopulation)
WHERE
{
    ?hum wdt:P31 wd:Q486972;    # instances of human settlement
         wdt:P17 ?country;      
         wdt:P1082 ?population. 
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }
  GROUP BY ?country ?countryLabel 
  ORDER BY DESC (?sumPopulation)

SPARQL-query, 161 results.

The human settlements are grouped by countries using GROUP BY command:

GROUP BY ?country ?countryLabel
The bubble diagram of countries by the total number of people living in "human settlement"


The bubble diagram above shows countries by the total number of people in "human settlement". The diagram and query show that the biggest number of the population live in the "human settlement" in such countries as Brazil (12 million), Pakistan (10 million), Mexico (8 million), Yemen (8 million), India (7 million), Bangladesh (7 million). These countries have climatic and geographic conditions for comfortable living in human settlement.

Checking that the script is executed correctly

To verify the correctness of the calculations, let's write a script where the list of human settlements with the number of inhabitants for the country with the smallest result of the total population can be seen. The request showed that this is the country Montenegro, therefore, we get the list of settlements and their population in Montenegro.

# 26.10.2017
SELECT ?humLabel ?hum ?population
WHERE
{
    ?hum wdt:P31 wd:Q486972; # instances of human settlement
         wdt:P17 wd:Q236;    # Montenegro 
         wdt:P1082 ?population. 
  
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }

SPARQL-query, 1 result.

The test was successful, as this script showed the population number of people living in "human settlements" Montenegro. This population number is the same as in the previous script that shows countries by the total number of people living in "human settlement".

Completeness of the Wikidata

Human settlement is a common name for places with permanent residents. According to the editors of the Wikidata, the concept of a human settlement includes cities, villages, hamlets and others. The complete list can be seen in the section of this article «List of objects associated with "human_settlement" in "instance of"». There was no exact information on the number of human settlements in the world. Therefore, the completeness of the human settlements that are on the Wikidata will be checked. The given task is: to build an ordered list of countries by the total number of people living in the "human settlement". To do this, let is construct a request that will show the human settlements with an empty property 'population' - SPARQL-query. The query showed that there are 372997 such settlements. So from 411393 (based on query «Instances of the "human settlement"») only 38396 or 9.3% of human settlements have a 'population' property. And now let's look at the settlements, which do not have the country - SPARQL-query. There were 8427 objects. Therefore, as a result of solving this problem, an incomplete picture was obtained of the total population in settlements by country.

According to the project "Human settlements of Russia/Statistics", the Russian Wikipedia contains approximately 75000 articles about the settlements of Russia. According to the 2010 census, there are 155510 settlements in Russia. Let's check how many objects are contained in the Wikidata about Russian settlements with the help of the following SPARQL-query. As a result, 4113 objects will be received, which is 2.6% of the total number of settlements. Thus, the Wikidata contain too little information about the settlements of Russia.

So, the degree of filling of the Wikidata by human settlements is low. Namely, in some cities, towns, villages and other settlements on the Wikipedia there is no property "instance of", whose value can be "human settlement". In addition, there are almost empty and poorly completed objects. To solve these problems, it is need to fill in these properties and link the objects of the Wikidata to each other.

Filling in the Wikidata

The "instance of" property of 100 objects of human settlements in Russia (with empty property "instance of") was assigned the value "human settlement".

As of October 25, 2017, the Wikidata contained 4207 objects about the human settlements of Russia, which was 2.6% of the total number of settlements according to the census for 2010 and 5.6% of the data of the Russian Wikipedia. This can be seen with the following SPARQL-query.

The proportion of the country's population living in the "human settlement"

Construct an ordered list of countries by the percentage of the ratio of the population living in "human settlements" to the number of inhabitants in the country.

# 23.11.2017
SELECT ?countryLabel (SUM(?population / ?pop) as ?proportionPopulation) (?proportionPopulation * 100 as ?percentPopulation)
WHERE {
  ?hum wdt:P31 wd:Q486972.    # instances of human settlement  
  ?hum wdt:P17 ?country.      # country 
  ?hum wdt:P1082 ?population. # population
  ?country wdt:P1082 ?pop.    # population in the country
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }
GROUP BY ?country ?countryLabel
ORDER BY DESC (?percentPopulation)

SPARQL-query, 158 results.

Diagram of the share of the population of the country living in "human settlements"


The curve in the figure for each individual country shows the ratio of the number of people living in "human settlements" to the number of inhabitants in the country. The graph shows that the highest percentage accounted for the following countries are Kiribati (78%), Niue (70%), Greece (53%), Tuvalu (48%), Comoros (43%), Mauritius (42%). It is interesting to note that these are mostly small island states. Probably, most of the inhabitants of these countries are concentrated in settlements.

Consider the G8 countries: Russia (2.98%), the USA (1.76%), Japan (0.80%), Canada (0.26%), France (0.20%), Germany (0.24%), Great Britain (0.18%), Italy ( 0.07%). Note that these are industrialized countries.

Let us derive the following hypothesis: a high percentage of the population of the country living in "human settlements" indicates a more agrarian country. In fact, there is the possibility of developing agriculture in these territories. Based on the graph and query, it can be seen that the highest percentage accounted for countries that are island, southern, hot countries, in which it is inappropriate to develop industry (a small territory, a small number of people, remoteness from the continents). And the industrialized countries (G8) have a very low percentage of the population of the country living in "human settlements". Consequently, the hypothesis is confirmed.

The list of objects that accompany "human settlement" in "instance of"

Let's construct the list of the objects accompanying "human_settlement" in the "instance of" property.

# 20.10.2017
SELECT ?instLabel (COUNT(?hum) as ?sumHum) 
WHERE
 { 
   ?hum wdt:P31 wd:Q486972;  # instances of human settlement
        wdt:P31 ?inst.       # other objects in instance
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }  
GROUP BY ?inst ?instLabel

SPARQL-query, 610 results.

The last query takes too long to run and yields an error message: "Query timeout limit reached". Let's add several constraints to this query, in order to speed up it and to reduce the number of result objects.

First, let's turn off from consideration such settlements that have only human settlement in the list "instance of". The result will not deteriorated, since it will not include only the "human settlement" type. To this end, we will include in our script a filter for the selection of the necessary settlements.

Secondly, we will not consider such objects of variable ?inst, which have the property "country". This will allow to cut off hundreds of types of settlements specific for individual countries, for example, administrative-territorial unit of Russia.

These restrictions allowed to fulfill the request for all countries of the world in an acceptable time (87 ms).

# 15.11.2017
SELECT ?instLabel (COUNT(?hum) as ?sumHum) 
WHERE
 { 
   ?hum wdt:P31 wd:Q486972;  # instances of human settlement
        wdt:P31 ?inst.       # other objects in instance
     
   MINUS {?inst wdt:P17 []}. # skip instances with country
   FILTER(?inst != wd:Q486972). # without human settlement
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
 }  
GROUP BY ?inst ?instLabel

SPARQL-query, 374 results.

Such facilities include:

  1. Village - 2844.
  2. Municipality - 1181.
  3. Hamlet - 662.
  4. Archaeological site - 425.
  5. Locality - 425.
  6. Destroyed city - 423.
  7. City - 322.
  8. Town - 277.
  9. Abandoned village - 254.
  10. Quarter - 207.

Future work

  • Count and deduce a list of famous personalities born in the human settlements (by country).
  • Calculate and build a graph of the ratio of the total area of human settlements to the area of the country.
  • Find human settlements, founded in the XXI century.
  • Consider only those settlements that no longer exist. Construct a list of such settlement, ordered by the length of existence of the settlement.

Exercises

1

Which populated place in Russia has the lowest population density?

Aleisk town in the Altai Krai
Zarechny town in the Sverdlovsk Oblast
Barabinsk town in the Novosibirsk Oblast
Zverevo town in the Rostov Oblast

2

Choose which of the presented coats of arms belong to the settlements of the Russian Federation, and which are not.

Belong,not belong
Aznakeevskii rayon gerb.png
Coat of Arms of Asbest (Sverdlovsk oblast).png
Loučovice CoA.jpg
POL Otynia COA.svg
Coat of Arms of Azov.svg

3

What country does the panorama of this human settlement?

Vothonas Banner Santorini.jpg


SPARQL-query with answers:

This article is issued from Wikiversity. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.