Jigsaw Rate Severity Of Toxic Comments

Competition Overview

In this competition, we will be asking you to score a set of about fourteen thousand comments. Pairs of comments were presented to expert raters, who marked one of two comments more harmful — each according to their own notion of toxicity. In this contest, when you provide scores for comments, they will be compared with several hundred thousand rankings. Your average agreement with the raters will determine your individual score. In this way, we hope to focus on ranking the severity of comment toxicity from innocuous to outrageous, where the middle matters as much as the extremes.

Dataset

Ruddit Dataset
Jigsaw Rate Severity of Toxic Comments
toxic-task

Due Date

Team Merge Deadline - January 31, 2022
Submission Deadline - February 7, 2022

Members

- Jeongwon Kim ([email protected])
- Jaewoo Park ([email protected])
- Youngmin Paik ([email protected])
- Kyubin Kim ([email protected])

Competition Results

Submission notebook

BERT Model(Electra + RoBERTa + DeBERTa + Distilbart) Ensemble
BERT Model(Electra + RoBERTa + DeBERTa + Distilbart) + Linear Model(Tfidf + LinearRegression) Ensemble

Task

This task is a PIPELINE to experiment with PLM(Pretrained Language Model), which is often used in NLP tasks, and the following models are tested

RoBERTa base
RoBERTa large
ELECTRA
Muppet RoBERTa
DeBERTa
BART
Distilbart
GPT2

It was judged that BERT based Model(eg. BERT, RoBERTa) would be more robust than Linear Model(eg. linearregressor) because it can respond to new words and takes into account the context.

Competition Review

Modeled without considering the Cross Validation score and only considering the Public Leaderboard score, showing disappointing results for the Private Leaderboard score
We trained the model by combining multiple dataset(eg. toxic-task, Ruddit Dataset), so model was weak with Private dataset

Program

Fetch Pretrained Models

$ sh ./download_pretrained_models.sh

Train

$ cd /jigsaw-toxic-severity-rating/jigsaw_toxic_severity_rating
$ python3 run_train.py \
          "--config-file", "/jigsaw-toxic-severity-rating/configs/roberta.yaml", \
          "--train", \
          "--training-keyword", "roberta-test"

You can set launch.json for vscode as follow:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "run_train.py",
            "console": "integratedTerminal",
            "cwd": "${workspaceFolder}/jigsaw_toxic_severity_rating",
            "args": ["--config-file", "../configs/roberta.yaml", "--train"]
        }
    ]
}

Requirements

Environment

Python 3.7 (To match with Kaggle environment)
Conda
git
git-lfs
CUDA 11.3 + PyTorch 1.10.1

Pytorch version may vary depanding on your hardware configurations.

Installation with virtual environment

git clone https://github.com/medal-contender/nbme-score-clinical-patient-notes.git
conda create -n nbme python=3.7
conda activate nbme
# PyTorch installation process may vary depending on your hardware
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
cd nbme-score-clinical-patient-notes
pip install -r requirements.txt

Run this in WSL (or WSL2)

./download_pretrained_models.sh

To update the code

$ git pull

If you have local changes, and it causes to abort git pull, one way to get around this is the following:

# removing the local changes
$ git stash
# update
$ git pull
# put the local changes back on top of the recent update
$ git stash pop

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
configs		configs
docker		docker
images		images
input/jigsaw-toxic-severity-rating		input/jigsaw-toxic-severity-rating
jigsaw_toxic_severity_rating		jigsaw_toxic_severity_rating
kaggle		kaggle
.gitignore		.gitignore
README.md		README.md
download_pretrained_models.sh		download_pretrained_models.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jigsaw Rate Severity Of Toxic Comments

Competition Overview

Dataset

Due Date

Members

Competition Results

Submission notebook

Task

Competition Review

Program

Requirements

Environment

Installation with virtual environment

To update the code

About

Releases

Packages

Contributors 5

Languages

medal-contender/jigsaw-rate-severity-of-toxic-comments

Folders and files

Latest commit

History

Repository files navigation

Jigsaw Rate Severity Of Toxic Comments

Competition Overview

Dataset

Due Date

Members

Competition Results

Submission notebook

Task

Competition Review

Program

Requirements

Environment

Installation with virtual environment

To update the code

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages