Skip to content

DBNQA Dataset correction

S R Tarun edited this page Jul 10, 2018 · 3 revisions

Initially to the check the preciseness of the DBNQA dataset and to rectify the errors if any, We decoded all the templates in the dataset and checked them over the dbpedia endpoint here.

Results:

Out of 899 thousand queries 14697 queries returned an error message and 8395 queries had an empty result set.

Out of the 15k queries which returned error most of the queries had an issue because of the short uri "dbr:" . To correct the error queries we used a regular expression to remove double spaces and replace the short URI to full URI.

re.sub(r"dbr:([^\s]+)" , r"<http://dbpedia.org/resource/\1>" , q)

After the above mentioned changes were made the number of queries which had an error got reduced from 15k to ~300.

While most of the queries worked after making the above changes a few still had some issues with the ending closing bracket because of the inconsistency in the templates.

Example

Incorrect formation: select distinct ?uri where{?uri rdf:type dbo:VideoGame . ?uri dbo:publisher <http://dbpedia.org/resource/C&E}>

Correct formation: select distinct ?uri where{?uri rdf:type dbo:VideoGame . ?uri dbo:publisher <http://dbpedia.org/resource/C&E> }

BLEU Accuracy

Data 12k epochs 36k epochs 120k epochs
DBNQA 72.4 75.2 75.7
Monument_300 76.7 76.8 76.8
Clone this wiki locally