Automated Sentiment Analysis of Movie Reviews using various approaches including sklearn models, keras models & transfer learning
The goal for this analysis is to predict if a review rates the movie positively or negatively. Inside this dataset, there are 25,000 labelled movies reviews for training, 50,000 unlabeled reviews for training, and 25,000 reviews for testing.
- IMDB movie reviews dataset
- http://ai.stanford.edu/~amaas/data/sentiment
- https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- Contains 25000 positive and 25000 negative reviews
- Contains at most reviews per movie
- At least 7 stars out of 10 → positive (label = 1)
- At most 4 stars out of 10 → negative (label = 0)
- Exploration and Preprocessing
- Base Models (Logistic Regression, Multinomial NB)
- Keras Models
- PyTorch RNN Model
- BERT Fine Tuned Model
- The data used for this problem can be found at: https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
- The preprocessed data can be found at : https://drive.google.com/file/d/1-KrTwLg3b2NcHFafK_lrKeyTvUK3NhiL/view?usp=sharing
- Logistic Regression | 90.79 %
- Support Vector Machine | 91.08 %
- Multinomial Naive Bayes | 91.32 %
- Simple Neural Net Keras | 92.83 %
- RNN LSTM PyTorch | 86.04 %
- BERT Fine Tuning | 91.68 %