< Research in programming Wikidata

The article is devoted to the study of the Wikidata object "anime". With the help of SPARQL queries, computed on objects of the "anime" in Wikidata, the following tasks were solved: ordered list of actors (seiyu) according to the quantity of their anime, histogram of the number of seiyu who voiced one or more anime, graph contacting seiyu and anime.

Instances of the object "Anime"

Anime is a Japanese animation. Each anime has a voice actors. In the future, we will use the word "Seiyu" (i.e Japanese voice actors). Voice actors and seiyu are synonymous when we speak about Japan and Japanese animation. The word "title" will be mentioned when we refer specifically to an anime (or Japanese animation in general).

Let's build a list of all anime.

#added 2017-06
#List of `instances of` "anime"
SELECT ?anime ?animeLabel
WHERE
{
    ?anime wdt:P31 wd:Q1107.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 683 Records.

πŸ‘> The most complete and elaborated anime on the Wikidata are: Gurren Lagann, Space Battleship Yamato, Project A-Ko

πŸ‘Ž> Little informative anime: Charlotte, Dagashi kashi, KonoSuba

Ordered list of seiyu according to the quantity of their anime

Almost in any anime there are several voice actors (seiyu). Most seiyu voiced several anime titles in their career, many - even a few dozen titles. Talented seiyu was invited to voice a few characters in one anime.

Let's construct an ordered list of seiyu according to the quantity of their anime.

#Ordered list of actors (seiyu) according to the quantity of their anime
SELECT ?seiyu (SAMPLE(?label) AS ?seiyuLabel) (COUNT(?anime) AS ?count)
WHERE
{
  ?anime wdt:P31 wd:Q1107;	 # Instance of anime
         wdt:P725 ?seiyu. 	 # Instance of seiyu (voice actor)
  ?seiyu rdfs:label ?label.	 # Subclass of label
  FILTER(LANG(?label) = "en").
}
GROUP BY ?seiyu		# Group by seiyu
ORDER BY DESC(?count)	# Order by count of voiced anime

SPARQL-query, 148 Results.

We can see a list of 148 seiyu. The list is small, but the quantity of anime on the Wikidata was small too - only 683 titles (we will speak about completeness later). This list includes seiyu who voiced a few anime (for example, Aki Toyosaki - 26 anime).

Histogram of the number of seiyu who voiced one or more anime

It would be interesting to build a histogram (line chart) with seiyu who voiced anime (the more anime seiyu voiced, the farther on the diagram it will be, "right" in this case).

#added 2017-06
#Histogram of the number of seiyu who voiced one or more anime
#defaultView:LineChart         # Do line chart as result representation
SELECT ?haveseiyu (COUNT(?haveseiyu) AS ?quantity) WHERE {  # Count number of seiyu having a voice acting                                                                      # and number of sister cities themselves
  {
     SELECT (COUNT(?seiyu) AS ?haveseiyu) WHERE {      # Count quantity of voice acting
       ?anime wdt:P31 wd:Q1107;
              wdt:P725 ?seiyu.
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
     }
     GROUP BY ?anime         # Group list by quantity of voiced anime
     ORDER BY DESC(?haveseiyu)      # Order by seiyu quantity (descending)
  }
}
GROUP BY ?haveseiyu       # Group by seiyu quantity
ORDER BY DESC(?haveseiyu)      # Order by seiyu qty (descending)

SPARQL-query, 13 Records.

Obviously that the more anime we take, the smaller quantity of seiyu participates in the voice acting (Fig. 1). We can see it on this histogram. Most seiyu, as shown on the diagram, voiced only 1 anime. This may be due to the incompleteness of the Wikidata.

Fig. 1: Histogram of the number of seiyu who voiced one or more anime


Graph contacting seiyu and anime

As it was said earlier, several seiyu can voice several characters in one anime (rare), and also voice several anime. We will construct a graph that connects seiyu and the anime voiced by them, in order to show this relationship more exactly.

#added 2017-06
#Graph contacting seiyu and anime
#defaultView:Graph
SELECT ?anime ?animeLabel ?seiyu ?seiyuLabel
WHERE
{
    ?anime wdt:P31 wd:Q1107
    ; rdfs:label ?label .

    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    OPTIONAL { ?anime
         wdt:P725 ?seiyu. }
    FILTER  (LANG(?label) = "en")
}

SPARQL-query, 826 Records.

This graph (Fig. 2) shows all the existing anime and seiyu who voiced these anime. The lack of the graph is that a large number of anime remained "without voice acting".

Fig. 2: Graph contacting seiyu and anime


Fullness of Wikidata

Russian Wikipedia displays a list of the 2089 anime. We can see TV shows of anime in Russia by years too.

In English Wikipedia we can see almost the same result. We can wath all anime wtih the help of anime categories, which are 11.

List of anime that was found on the site [1]. There are 11173 anime while the number of found objects on the Wikidata is only 683 . Furthermore the speed of the release of new anime is pretty high. Even in the winter-summer of 2017, more than 600 anime was released. After completing some of the requests mentioned earlier, some anime were not found (Useless Animals, Ryuu no Haisha, Frame Arms Girl). We can conclude that the Wikidata contain extremely incomplete data.

According to the statistics from the same site [2] there is "Anime Industry" section and there are 16 sections of Japanese animation which include 10497 titles. Possibly the following articles and web-sites will not be authority, but with the help of them we can collect information about the available anime and describe incompleteness.

There are 7811 anime on [3].

There are 4905 anime on [4].

There are 4751 anime on [5].

There are 2971 anime on [6].

There are 1881 anime on [7].

There are 763 anime on [8].

It can be concluded that different web-sites have different information about the available anime. Some web-sites appeared later, another - earlier, so the quantity of anime can vary very seriously. Also, the quantity of visitors affects on the appearance of some anime on the web-site. If we order all the web-sites, the data on the Russian Wikipedia and the English Wikipedia data by the quantity of anime, then the last place will be Wikidata. On Wikidata you can not find all the popular and famous Japanese animations of the world, which says about the incompleteness.

One of the last requests said that there are 148 seiyu on the Wikidata. The fact that they were searched in the anime section, so the result was so small. If we will fix the script and try to display all voice actors (remove the anime category), the result may change.

#added 2017-06
#Ordered list of actors according to the quantity of their voiced projects
SELECT ?actor (SAMPLE(?label) AS ?actorLabel) (COUNT(?anime) AS ?count)
WHERE
{
  ?anime wdt:P725 ?actor. 	 # Instance of voice actor
  ?actor rdfs:label ?label.	 # Subclass of label
  FILTER(LANG(?label) = "en").
}
GROUP BY ?actor		# Group by actor
ORDER BY DESC(?count)	# Order by count of voiced anime

SPARQL-query, 3965 Records.

There are 3965 voice actors. Let's compare the result with seiyu. The last script displayed Aki Toyosaki who voiced 26 anime. This script gave a result in 62 anime. This means that the data was filled in a not correct way, so it was necessary to refer anime category.

Filling 100 objects

A fan of Japanese animation wants to know in what year his favorite anime came out. Wikidata doesn't have this information about all anime. We will write a script that would show the number of anime with an empty field "publication date".

#added 2017-06
#List of anime with unfilled publication date
SELECT ?anime ?animeLabel
WHERE
{
    ?anime wdt:P31 wd:Q1107.
    FILTER NOT EXISTS { ?anime wdt:P577 [] } #if property publication date is unfilled
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 237 Records.

We can see the result with 237 records. It means that 1/3 of the anime doesn't have a publication date.

After filling in the data and running the script again we can see 134 records.

Future work

  1. Output 10 anime that were released in 2017.
  2. Output 5 anime in which the quantity of seiyu is the largest.
  3. Build a bubble diagram (BubbleChart) of distribution of anime by genre (how many anime in each genre).

Test

Syntax error

1 There are some anime:
β–  Rave Master (Shan T Lao Fu Zi)
β–  Tetsujin 28-go (Tetsujin 28-gou)
β–  Grenadier (Grenadier)
β–  Attack on Titan (Shingeki no Kyojin)
Correlate the anime's data with the images below.

1 (Rave Master),2 (Tetsujin 28-go),3 (Grenadier),4 (Attack on Titan)
Rave.jpg
Attack on Titan logo.png
???
Fruit de grenadier.jpg

2 There are anime:
β–  Gurren Lagann (Tengen Toppa Gurren Lagann)
β–  Steins;Gate (Steins;Gate)
β–  Hellsing (Hellsing)
β–  Elfen Lied (Elfen Lied)
Years of the creation of anime are known: 2011, 2007, 2004, 2001.
Arrange the anime's data in order of decreasing date of their creation (1st place is the newest anime, 4th place is the oldest one).

1 place (2011),2 place (2007),3 place (2004),4 place (2001)
Gurren Lagann
Steins;Gate
Hellsing
Elfen Lied

3 About what anime this description is for?:
Brief description: "And what will happen after death?" Countless generations of people asked this question ..."
Genres: Drama, Action, Comedy, School
Seiyu (fem.): Kana Hanadzawa
Publication date: 2005
Note: Punctuation and spaces signs are important, if there are any of them.

References

  1. ↑ shikimori 2017.
  2. ↑ shikimoriind 2017.
  3. ↑ anidub 2017.
  4. ↑ animespirit 2017.
  5. ↑ animeland 2017.
  6. ↑ anistar 2017.
  7. ↑ animevost 2017.
  8. ↑ anidesu 2017.
  • Andrew Krizhanovsky; Andrew Krizhanovsky; Daria Boollieva (2017). "АнимС" [Anime]. Authorea.
This article is issued from Wikiversity. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.