Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train-test split of the TREC dataset #1

Open
Victor0118 opened this issue Feb 1, 2019 · 1 comment
Open

Train-test split of the TREC dataset #1

Victor0118 opened this issue Feb 1, 2019 · 1 comment

Comments

@Victor0118
Copy link

Victor0118 commented Feb 1, 2019

In some open domain QA papers, I saw the CuratedTREC dataset is used and linked here. But I cannot find the train/test split here. Even more surprisingly, I find the statistics of the train/test splits in two papers are different:

Does anyone know how to solve this problem?

@jhyuklee
Copy link

I guess the split is based these two files: large2470-test.tsv and large2470-train.tsv (Large Variant of the Dataset) excluding QA pairs with 'lfb' ids (QA pairs from live.ailao.eu I guess. see d81aca5).

The numbers from DrQA paper are correct in this case, but I'm not sure where the number 1204 comes from in the R^3 paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants