Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs

Requirements

Software

Necessary Python requirements: sudo pip3 install -r requirements.txt
graph-tool Python Package: This can not be installed through pip -> Installation intructions

System

For fast access on the Wikidata graph we create a binary representation for in-memory access in the observation extraction phase. The creation and the use of this graph can take easily up to 200GB of memory.

Data Pipeline

Data Sources

The following tasks are dependent on each other and can be run without any parameters, the default parameters expect the datasets to be present in the subfolder /data. The necessary origin datasets are not anymore available at the source:

Edit History: Any recent Version of *-pages-meta-history1.xml at https://dumps.wikimedia.org/wikidatawiki can be used. (see below)
JSON Dump: https://zenodo.org/record/3268725 (accessed at https://dumps.wikimedia.org/wikidatawiki/entities/20180813)

Additionaly we provide the data for every intermediary step as download at https://zenodo.org/record/3268818.

1. Export Edits from Edit History

0_export_edits.sql
1. Load the XML Dump of Wikidata in a SQL Database (with e.g. MWDumper).
2. The provided query exports all edits. (The query can be restricted to edits before the timestamp "2018-10-01" to recreate the output presented in the paper.)

2. Data Preparation

1_create_inmemory_graph.py: Extract an in-memory representation of Wikidata. This is a subset of our wd-graph project. The output of the wd-graph create.py can also be used.
2_extract_observations.py: Extract the observations from the edits with help of the in-memory Graph.

3. Calculate Estimates and Convergence

3_calculate_estimates.py: Calculate the Estimates of all Classes.
4_draw_graphs.py: Draw the graphs and calculate the Convergence for all Classes. With -g "" no graph is loaded (which uses much less memory)

Estimator and Metrics

The estimators and metrics are available at estimators.py and metrics.py respectively.

Results

cardinal.exascale.info

For all classes with at least 5000 observations we calculated the convergence metric and draw the graph. Find all classes listed on cardinal.exascale.info.

Additionally we also provide the results as CSV result.csv (tab separated, utf-8) file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs

Requirements

Software

System

Data Pipeline

Data Sources

1. Export Edits from Edit History

2. Data Preparation

3. Calculate Estimates and Convergence

Estimator and Metrics

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
0_export_edits.sql		0_export_edits.sql
1_create_inmemory_graph.py		1_create_inmemory_graph.py
2_extract_observations.py		2_extract_observations.py
3_calculate_estimates.py		3_calculate_estimates.py
4_draw_graphs.py		4_draw_graphs.py
LICENSE		LICENSE
README.md		README.md
estimators.py		estimators.py
metrics.py		metrics.py
requirements.txt		requirements.txt
results.csv		results.csv

License

eXascaleInfolab/cardinal

Folders and files

Latest commit

History

Repository files navigation

Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs

Requirements

Software

System

Data Pipeline

Data Sources

1. Export Edits from Edit History

2. Data Preparation

3. Calculate Estimates and Convergence

Estimator and Metrics

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages