Name		Name	Last commit message	Last commit date
parent directory ..
Annotation Guideline.pdf		Annotation Guideline.pdf
README.md		README.md
test.csv		test.csv
train.csv		train.csv

README.md

Introduction

We collected the Amharic Hate Speech dataset from Twitter using the Twitter API over a period of 5 years, spanning from 2018-2022. Data Annotation was conducted using Yandex Toloka Cropwdsorcing Platform. Three independent annotators label each tweet and the gold labels are determined using a majority voting scheme. Read our papers[URL to reased sooon] for more details about the dataset.

Amharic Hate Speech Data Annotation: Crowdsourcing-Based

The dataset contains train/test datasets with Tweet_id, tweet, and label. The dataset is annotated by three independent annotators or tolokers on Toloka crowdsourcing tool, and the gold_label is determined with majority voting.

For more details, You can read our papers:

The 5Js in Ethiopia: Amharic Hate Speech Data Annotation Using Toloka Crowdsourcing Platform

How to cite our paper:

@inproceedings{ayele20225js,
  title={{The 5Js in Ethiopia: Amharic hate speech data annotation using Toloka Crowdsourcing Platform}},
  author={Ayele, Abinew Ali and Dinter, Skadi and Belay, Tadesse Destaw and Asfaw, Tesfa Tegegne and Yimam, Seid Muhie and Biemann, Chris},
  booktitle={2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)},
  pages={114--120},
  year={2022},
  url = {https://ieeexplore.ieee.org/document/9971189},
  address ={Bahir Dar, Ethiopia},
}

Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform

How to cite our paper

@inproceedings{ayelechallenges,
  title={Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform},
  author={Ayele, Abinew Ali and Belay, Tadesse Destaw and Yimam, Seid Muhie and Dinter, Skadi and Asfaw, Tesfa Tegegne and Biemann, Chris},
 booktitle = {Proceedings of the The Sixth Widening NLP Workshop (WiNLP)},
  year = {2022},
  address = {Abu Dhabi, United Arab Emirates},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.winlp-1.0},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICT4DA

ICT4DA

README.md

Introduction

Amharic Hate Speech Data Annotation: Crowdsourcing-Based

Files

ICT4DA

Directory actions

More options

Directory actions

More options

Latest commit

History

ICT4DA

Folders and files

parent directory

README.md

Introduction

Amharic Hate Speech Data Annotation: Crowdsourcing-Based