Skip to content

Commit 14806bd

Browse files
committed
2 parents 8bb4b10 + 782bec4 commit 14806bd

File tree

2 files changed

+15
-0
lines changed

2 files changed

+15
-0
lines changed

code/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
To run BERT-gloss-tagger.py make sure to install sklearn and ktrain: <br>
2+
pip install ktrain==0.28.3 <br>
3+
pip install sklearn==0.23.2 <br>

data/README.MD

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#### gloss-annotation.tsv:
2+
This file contains manual annotation of the gloss from PHOENIX 14T dataset.
3+
4+
#### full.train, full.dev, full.test:
5+
These tab separated files contain manual annotation along with tags predicted by our classifier.
6+
The first column denotes whether the label is manual annotation or predicted by the classifier.
7+
The second column contains gloss data of PHOENIX 14T dataset infused with the intensification labels. Gloss tokens that are predicted/annotated to have lower intensity, are surrounded by <LOW-INT> and </LOW-INT>. Tokens with higher intensity are surrounded similarly with <HIGH-INT> and </HIGH-INT>. The tokens that do not have any associated intensity, are not surrounded by anything,
8+
Example:
9+
1. $manual-annotation$ <HIGH-INT> WOLKE </HIGH-INT> LOCH SPEZIELL NORDWEST
10+
- This means the intensity labels for this instance are manually annotated. The token WOLKE has high-intensity, other tokens have no intensity.
11+
2. $tagged$ SAMSTAG AUCH <LOW-INT> FREUNDLICH </LOW-INT>
12+
- This means the intensity labels for this instance are tagged by our classifier. The token FREUNDLICH has low intensity, other tokens have no intensity.

0 commit comments

Comments
 (0)