Skip to content

Latest commit

 

History

History
51 lines (30 loc) · 1.74 KB

README.md

File metadata and controls

51 lines (30 loc) · 1.74 KB

LILY LectureBank

This is the repo for the LectureBank Corpus, with all batches and updates.

Note that we also have a few works using part of the corpus, you can find more details in the LB-Paper folder.

Meta Data

data-versions

lb*.tsv: data with different versions.

ID, Instructor, Title, Topic, URL, Venue, Year

  • ID: Id of each line.
  • Instructor: The author name(s).
  • Title: File tile.
  • Topic: The Topic Number, check taxonomy.csv for topic name.
  • URL: Online URL.
  • Year: Year of the course.
  • Venue: Name of the university, or GitHub.

We went through a URL check on May, 2022, here are the valid resource numbers:

  • 1020 lb1.tsv
  • 308 lb2.tsv
  • 3564 lb3.tsv
  • 3136 lb4.tsv
  • 1321 lb5.tsv
  • 397 lb6.tsv

NOTE: we combined all five batches of LectureBank, and remove duplicates and invlaid urls. All data can be found in alldata.tsv with a total number to be 7499.

Taxonomy

NLP taxonomy release. In the file taxonomy.csv, we include the taxonomy with 320 topics in a tree structure. The topic ID for each topic shows the parent node. For example, 233 (Relation Extraction) has a parent node to be 23 (Part of Speech Tagging), and topic 23 has its parent node to be 2 (Language Modeling, Syntax, Parsing).

  • Topic ID: Id of topic.
  • Topic: topic name.

You can find how this was created in our paper CLICKER: A Computational LInguistics Classification Scheme for Educational Resources.

Other resources

Please visit our website AAN.how.