< Research in programming Wikidata

This research is devoted to the analysis of aircraft on Wikidata. A list of aircraft was generated, a diagram of aircraft manufacturers grouped by country was drawn with the help of SPARQL query. The computer program (script) was written which fills labels and descriptions of aircraft manufacturer on Wikidata. The data about Russian aircraft manufacturers was added to Wikidata with the help of this script.

List of "Aircraft"

Lets' make a list with all Aircraft.

#List of `instances of` "aircraft" 
SELECT ?item ?itemLabel
WHERE
{
    ?item wdt:P31 wd:Q11436. # instances of aircraft
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 1564 records (2017), 3324 records (2020).

SPARQL-query, 153 records have labels in Russian language (2017), 299 records have labels in Russian language (2020).

Examples of the most complete and well-developed aircraft on the Wikidata are Mikoyan-Gurevich MiG-3, Yakovlev Yak-36, Mitsubishi A5M

Almost empty and uninformative aircraft instances are Mikoyan-Gurevich MiG-1, Sukhoi Su-6, Ilyushin Il-103

For 2020, the most developed aircraft on wikidata are: Sopwith Triplane (18 properties), IL-103 (14 properties), Martin 2-0-2 (14 properties).

For 2020, uninformative aircraft are: Beriev Be-1 (3 properties), Lituanica (4 properties), Lavochkin La-168 ( 3 properties).

Aircraft manufacturers

Let's make a list of aircraft manufacturers.

# Count aircraft having property manufacture, group by manufacture
SELECT ?manufactureLabel (COUNT(?item) AS ?count) 
WHERE {
  ?item wdt:P31 wd:Q11436.     # instance of aircraft
  ?item wdt:P176 ?manufacture. # show manufacture
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?manufacture ?manufactureLabel

SPARQL-query, 300 records (2017), 597 records (2020).

By dint of the SPARQL-query we got 300 manufacturers that make aircraft.

Number of aircraft produced

The aviation industry is one of the largest mechanical engineering industries in the world. Its tasks include both the development and production of various aerial vehicles. In order to assess which aircraft models are the most widespread, we will build a diagram of the produced aircraft of various models.

SELECT ?itemLabel ?count WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  ?item wdt:P31 wd:Q11436. # instance of aircraft
  ?item wdt:P1092 ?count.  # total aircraft manufactured
}

SPARQL-запрос, 177 records (2020)

Some aircraft models were produced in small numbers, so they can be excluded to improve the readability of the diagram. To get a new list, let's add a filter to the request.

SELECT ?itemLabel ?count WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  ?item wdt:P31 wd:Q11436. # instance of aircraft
  ?item wdt:P1092 ?count. # total aircraft manufactured
  
  FILTER (?count > 10)
}

SPARQL-запрос, 86 records (2020).

As a result of fulfilling the request, it can be seen that the most aircraft of the following models were produced: PA-32 Cherokee Six (7842), Piper PA-24 Comanche (4857), Junkers W 34 (3000), Pomilio PE (1616).

Now let's try to answer the question: "Does w: Pareto principle hold with respect to the number of aircraft models"?

In order to build a graph, you must perform the following steps:

1. Calculate the total number of aircraft for all models. To accomplish this task, the following script was written:
SELECT (SUM(?count) as ?sum) WHERE {
  SELECT ?count WHERE {
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    ?item wdt:P31 wd:Q11436; # instance of aircraft
      wdt:P1092 ?count. # total aircraft manufactured
  }
}
SPARQL-request, 33 178 aircrafts (2020).
2. On the X axis - the number of aircraft models under consideration (i.e. 1 - the number of manufactured aircraft of the 1st model, 2 - the sum of the aircraft of the 1st and 2nd models, etc.). On the Y-axis we will plot the percentage of the number of aircraft models produced to the total number of aircraft produced for all time. Also, on the X axis, we postpone the second scale from 0 to 100% to make it easier to determine the parameters for the Pareto principle.

The graph shows that 80% of all aircraft produced are 16 different aircraft models, which is 9.2% of the total number of models. Pareto's law states that: "20% of the effort gives 80% of the result, and the remaining 80% of the effort - only 20% of the result." It can be concluded that a stronger law holds than the Pareto principle regarding the number of aircraft models.

Origin countries of aircraft manufacturers

Let's make a list of manufacturers grouped by countries.

# Count manufacture having property country group by country
SELECT ?countryLabel (count(?item) as ?count)
WHERE
{
    ?item wdt:P31 wd:Q936518.   # instance of manufacture
    ?item wdt:P17 ?country.     # belong to country
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabel

SPARQL-query, 39 records (2017), 46 records (2020).

Bubble graph, where circles relate to countries and there sizes correlate with number of aircraft manufacturers in the country. This graph shows the difference between objects.


As can be seen from the SPARQL-query, origin countries of aircraft manufacturers are less informative than they could be. Most of the aircraft manufacturers are located in USA (115), Great Britain (30), Germany (17), Russia (17) for 29 may 2017.

Comparing 2 bubble charts for 2017 and 2020, we can conclude that the main aircraft manufacturers are: USA (135), UK (43), France (29), Germany (26), Russia (21). The USA is still the leader, but France in 3 years managed to outstrip Germany, taking 3rd place. But in general, the ratio of aircraft production between different countries remains the same.

Completeness of Wikidata

According to the site aviationfanatic.com there are about 1700 aircraft manufacturers[1] (2017) and 1939 (2020), but SPARQL query returned only 300 records in 2017, and in 2020 - 595 records. From this we can conclude that Wikidata is incomplete. Most likely, the manufacturers not listed made too few or none of the aircraft, so due to a lack of information they were not included in Wikidata.

Based on the data obtained, you can predict when the data in Wikidata will be complete. In three years, the number of aircraft manufacturers increased by 239, representing an annual increase of about 80 aircraft manufacturers. Also during this time, information about 295 aircraft manufacturers was entered into Wikidata, that is, about 98 new entries are added annually. Also for 2020, there are no records of 1344 aircraft manufacturers in Wikidata. Assuming a fixed number of new aircraft manufacturers appear annually and the number of Wikidata entries entered annually remains unchanged, we can assume that in about 75 years (i.e. 2095) Wikidata will contain records of all aircraft manufacturers.

There are 58 Russian aircraft manufacturers listed in the category Aircraft manufacturers of Russia, but on the site aviationfanatic.com listed [2] 61 manufacturers like Irkut Corporation, Russian Aircraft Corporation MiG, Tupolev.

Filling Wikidata

Label and description fields from objects of category Aircraft manufacturers of Russia was chosen for filling. There were too many objects with empty fields to write them by hands so the special program was written for this. At first JSON file with objects from this category and empty fields for filling must be created:

{
  "121 авиационный ремонтный завод": {
    "description": "",
    "descriptionen": "",
    "nameen": "",
    "qid": "Q4028573"
    },
  ...
}

First part of the program read from Wikidata already filled fields. After that the empty one must be filled. At the end JSON file looks like this:

{
  "121 авиационный ремонтный завод": {
    "description": "авиаремонтное предприятие, расположенное посёлке Старый Городок",
    "descriptionen": "aircraft repair facility, located in the village Stary Gorodok",
    "nameen": "121 aircraft repair plant",
    "qid": "Q4028573"
  },
  ...
}

Second part of the program writes data from JSON file to Wikidata.

With help of the God and this program work with Wikidata can be simplified because there no more need in go to the pages and write changes manually if fields in Wikidata are empty or contain incorrect information.

Future work

  • Find the plane with the maximum flight radius.
  • Mark on the political map of the world the location of the main offices of aircraft manufacturers.
  • Find the manufacturer with the most aircraft manufactured using the aircraft manufacturer property.
  • When was the first aircraft built?
  • Which firms were the first to produce 10, 100 and a thousand aircraft?
  • Draw a chart of the number of aircraft produced by year.

Tests

1 What Russian manufacturers have web site?

MiG
Saratov Aviation Plant
Tupolev
Sukhoi

2 Choose date of manufacturers foundation.

1.12.193918.11.19491.01.19221.01.1939
MiG
Vympel
Tupolev
Sukhoi

3 Choose city with headquarter of manufacturers.

KazanSaratovUlan-UdeMoscow
Kazan Helicopters Plant
Saratov Aviation Plant
Ulan-Ude Aviation Plant
Sukhoi

4 What is name of aircraft that lighter than air, flying by dint of big balloon with deadly dangerously gas right up of passengers head?

5 How airship looks like?

LAPD Bell 206 Jetranger.jpg
W-6 Ossoaviachim wiki.jpg
Airliners 28.07.2009 10-01-28.JPG

SPARQL-query manufacturers with web-sites

SPARQL-query date of manufacturers foundation

SPARQL-query headquarters of manufacturers

SPARQL-query airship

SPARQL-query airship with pictures

References

  1. List of all Manufacturers.
  2. List of Russian Manufacturers.
  • Artem.Potes. "Aircraft". ProWD. Retrieved 2020-10-04.
This article is issued from Wikiversity. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.