We collected the Amharic Hate Speech dataset from Twitter using the Twitter API over a period of 5 years, spanning from 2018-2022. Data Annotation was conducted using Yandex Toloka Cropwdsorcing Platform. Three independent annotators label each tweet and the gold labels are determined using a majority voting scheme. Read our papers[URL to reased sooon] for more details about the dataset.
The dataset contains train/test datasets with Tweet_id, tweet, and label. The dataset is annotated by three independent annotators or tolokers on Toloka crowdsourcing tool, and the gold_label is determined with majority voting.
For more details, You can read our papers:
- The 5Js in Ethiopia: Amharic Hate Speech Data Annotation Using Toloka Crowdsourcing Platform
How to cite our paper:
@inproceedings{ayele20225js,
title={{The 5Js in Ethiopia: Amharic hate speech data annotation using Toloka Crowdsourcing Platform}},
author={Ayele, Abinew Ali and Dinter, Skadi and Belay, Tadesse Destaw and Asfaw, Tesfa Tegegne and Yimam, Seid Muhie and Biemann, Chris},
booktitle={2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)},
pages={114--120},
year={2022},
url = {https://ieeexplore.ieee.org/document/9971189},
address ={Bahir Dar, Ethiopia},
}
- Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform
How to cite our paper
@inproceedings{ayelechallenges,
title={Challenges of Amharic Hate Speech Data Annotation Using Yandex Toloka Crowdsourcing Platform},
author={Ayele, Abinew Ali and Belay, Tadesse Destaw and Yimam, Seid Muhie and Dinter, Skadi and Asfaw, Tesfa Tegegne and Biemann, Chris},
booktitle = {Proceedings of the The Sixth Widening NLP Workshop (WiNLP)},
year = {2022},
address = {Abu Dhabi, United Arab Emirates},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2022.winlp-1.0},
}