Skip to content

Yale-LILY/LectureBank-Corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LILY LectureBank Corpus

This is the repo for LectureBank Corpus, with all batches and updates. We release meta data in this repo.

Note that we also have a few works using part of the corpus, you can find more details in this link.

Meta Data

Versions

lb*.tsv: data with different versions.

ID, Instructor, Title, Topic, URL, Venue, Year

  • ID: Id of each line.
  • Instructor: The author name(s).
  • Title: File tile.
  • Topic: The Topic Number, check taxonomy.csv for topic name.
  • URL: Online URL.
  • Year: Year of the course.
  • Venue: Name of the university, or GitHub.

We went through a URL check on May, 2022, here are the valid resource numbers:

  • 1020 lb1.tsv
  • 308 lb2.tsv
  • 3564 lb3.tsv
  • 3136 lb4.tsv
  • 1321 lb5.tsv

NOTE: we combined all five batches of LectureBank, and remove duplicates and invalid urls. All data can be found in alldata.tsv with a total number to be 7104.

Taxonomy

NLP taxonomy release. In the file taxonomy.csv, we include the taxonomy with 320 topics in a tree structure. The topic ID for each topic shows the parent node. For example, 233 (Relation Extraction) has a parent node to be 23 (Part of Speech Tagging), and topic 23 has its parent node to be 2 (Language Modeling, Syntax, Parsing).

  • Topic ID: Id of topic.
  • Topic: topic name.

You can find how this was created in our paper CLICKER: A Computational LInguistics Classification Scheme for Educational Resources.

Other resources

Please visit our website AAN.how.

Releases

No releases published

Packages

No packages published