Skip to content

Latest commit

 

History

History

ICT4DA

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Introduction

We collected the Amharic Hate Speech dataset from Twitter using the Twitter API over a period of 5 years, spanning from 2018-2022. Data Annotation was conducted using Yandex Toloka Cropwdsorcing Platform. Three independent annotators label each tweet and the gold labels are determined using a majority voting scheme. Read our papers[URL to reased sooon] for more details about the dataset.

Amharic Hate Speech Data Annotation: Crowdsourcing-Based

The dataset contains train/test datasets with Tweet_id, tweet, and label. The dataset is annotated by three independent annotators or tolokers on Toloka crowdsourcing tool, and the gold_label is determined with majority voting.

For more details, You can read our papers:

  1. The 5Js in Ethiopia: Amharic Hate Speech Data Annotation Using Toloka Crowdsourcing Platform

How to cite our paper:

@inproceedings{ayele20225js,
  title={{The 5Js in Ethiopia: Amharic hate speech data annotation using Toloka Crowdsourcing Platform}},
  author={Ayele, Abinew Ali and Dinter, Skadi and Belay, Tadesse Destaw and Asfaw, Tesfa Tegegne and Yimam, Seid Muhie and Biemann, Chris},
  booktitle={2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)},
  pages={114--120},
  year={2022},
  url = {https://ieeexplore.ieee.org/document/9971189},
  address ={Bahir Dar, Ethiopia},
}

  1. Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform

How to cite our paper

@inproceedings{ayelechallenges,
  title={Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform},
  author={Ayele, Abinew Ali and Belay, Tadesse Destaw and Yimam, Seid Muhie and Dinter, Skadi and Asfaw, Tesfa Tegegne and Biemann, Chris},
 booktitle = {Proceedings of the The Sixth Widening NLP Workshop (WiNLP)},
  year = {2022},
  address = {Abu Dhabi, United Arab Emirates},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2022.winlp-1.0},
}