Train-test split of the TREC dataset #1

Victor0118 · 2019-02-01T03:04:21Z

In some open domain QA papers, I saw the CuratedTREC dataset is used and linked here. But I cannot find the train/test split here. Even more surprisingly, I find the statistics of the train/test splits in two papers are different:

https://arxiv.org/pdf/1709.00023.pdf: 1204/694
https://arxiv.org/pdf/1704.00051.pdf: 1486/694

Does anyone know how to solve this problem?

jhyuklee · 2019-10-24T06:02:24Z

I guess the split is based these two files: large2470-test.tsv and large2470-train.tsv (Large Variant of the Dataset) excluding QA pairs with 'lfb' ids (QA pairs from live.ailao.eu I guess. see d81aca5).

The numbers from DrQA paper are correct in this case, but I'm not sure where the number 1204 comes from in the R^3 paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train-test split of the TREC dataset #1

Train-test split of the TREC dataset #1

Victor0118 commented Feb 1, 2019 •

edited

Loading

jhyuklee commented Oct 24, 2019

Train-test split of the TREC dataset #1

Train-test split of the TREC dataset #1

Comments

Victor0118 commented Feb 1, 2019 • edited Loading

jhyuklee commented Oct 24, 2019

Victor0118 commented Feb 1, 2019 •

edited

Loading