- Necessary Python requirements:
sudo pip3 install -r requirements.txt
graph-tool
Python Package: This can not be installed through pip -> Installation intructions
For fast access on the Wikidata graph we create a binary representation for in-memory access in the observation extraction phase. The creation and the use of this graph can take easily up to 200GB of memory.
The following tasks are dependent on each other and can be run without any parameters, the default parameters expect the datasets to be present in the subfolder /data
. The necessary origin datasets are not anymore available at the source:
- Edit History: Any recent Version of
*-pages-meta-history1.xml
at https://dumps.wikimedia.org/wikidatawiki can be used. (see below) - JSON Dump: https://zenodo.org/record/3268725 (accessed at https://dumps.wikimedia.org/wikidatawiki/entities/20180813)
Additionaly we provide the data for every intermediary step as download at https://zenodo.org/record/3268818.
-
- Load the XML Dump of Wikidata in a SQL Database (with e.g. MWDumper).
- The provided query exports all edits. (The query can be restricted to edits before the timestamp "2018-10-01" to recreate the output presented in the paper.)
- 1_create_inmemory_graph.py: Extract an in-memory representation of Wikidata. This is a subset of our wd-graph project. The output of the wd-graph
create.py
can also be used. - 2_extract_observations.py: Extract the observations from the edits with help of the in-memory Graph.
- 3_calculate_estimates.py: Calculate the Estimates of all Classes.
- 4_draw_graphs.py: Draw the graphs and calculate the Convergence for all Classes. With
-g ""
no graph is loaded (which uses much less memory)
The estimators and metrics are available at estimators.py and metrics.py respectively.
For all classes with at least 5000 observations we calculated the convergence metric and draw the graph. Find all classes listed on cardinal.exascale.info.
Additionally we also provide the results as CSV result.csv (tab separated, utf-8) file.