Name		Name	Last commit message	Last commit date
parent directory ..
Mini_Project_Naive_Bayes.ipynb		Mini_Project_Naive_Bayes.ipynb
README.md		README.md
callibration.png		callibration.png
cdf.png		cdf.png
critics.csv		critics.csv
terms.png		terms.png
terms2.png		terms2.png
vsm.png		vsm.png

README.md

Naive Bayes: Predicting Movie Ratings

Python Notebook

Assignment

This exercise demonstrates the basics of text analysis using Naive Bayes, hyper parameter tuning, and model selection.

Data

15,561 movie reviews aggregated on Rotten Tomatoes including fresh/rotten target class and a few other columns in csv format

Approach

Following a standard histogram exploration of the data, a corpus of reviews is created by vectorizing the movie review text bodies. A Multinomial classifier from scikit-learn is used to model split training data. Hyperparameters of alpha and minimum document frequency are selected through visual analysis of a CDF and by scoring model performance with different values. A Log Likelihood function is created as another optimization method, and thereafter a few different models are tested, but not optimized: TF-IDF, N-Grams, and Random Forest.

Reflection

As far as I know, Rotten tomatoes relies primarily on scaling numerical scores from other publications to label referenced reviews as rotten or fresh, but a day's work of machine learning can perform surprisingly well. A trigram model was 76.8% accurate on unseen test data. That's pretty cool! Trying out the default settings for the other models included was fun too, but not as engaging, as Scikit-Learn handles everything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naive Bayes - Predicting Movie Ratings

Naive Bayes - Predicting Movie Ratings

README.md

Naive Bayes: Predicting Movie Ratings

Assignment

Data

Approach

Reflection

Files

Naive Bayes - Predicting Movie Ratings

Directory actions

More options

Directory actions

More options

Latest commit

History

Naive Bayes - Predicting Movie Ratings

Folders and files

parent directory

README.md

Naive Bayes: Predicting Movie Ratings

Assignment

Data

Approach

Reflection