Apply for task sentiment analysis on using AIViVN's comments dataset
The model achieved 0.90268 on the public leaderboard, (winner's score is 0.90087) Bert4news is used for a toolkit Vietnames(segmentation and Named Entity Recognition) at ViNLPtoolkit(
***************New Mar 11 , 2020 ***************
BERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
We use word sentencepiece, use basic bert tokenization and same config with bert base with lowercase = False.
You can download trained model:
Use with huggingface/transformers
import torch
from transformers import AutoTokenizer,AutoModel
tokenizer= AutoTokenizer.from_pretrained("NlpHUST/vibert4news-base-cased")
bert_model = AutoModel.from_pretrained("NlpHUST/vibert4news-base-cased")
line = "Tôi là sinh viên trường Bách Khoa Hà Nội ."
input_id = tokenizer.encode(line,add_special_tokens = True)
att_mask = [int(token_id > 0) for token_id in input_id]
input_ids = torch.tensor([input_id])
att_masks = torch.tensor([att_mask])
with torch.no_grad():
features = bert_model(input_ids,att_masks)
Run training with base config
python \
--model_path=bert4news.pytorch \
--max_len=200 \
--batch_size=16 \
--epochs=6 \
For personal communication related to this project, please contact Nha Nguyen Van ([email protected]).