Skip to content

Commit

Permalink
Add additional analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
zelda4669 committed Sep 14, 2021
1 parent 5da192c commit 906e5b5
Showing 1 changed file with 29 additions and 11 deletions.
40 changes: 29 additions & 11 deletions Airline Tweets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,7 @@
"outputs": [],
"source": [
"stop_words = stopwords.words('english')\n",
"\n",
"#add custom stop words\n",
"stop_words.append('http')\n",
"stop_words.append('hr')\n",
Expand Down Expand Up @@ -206,7 +207,7 @@
"id": "d5e6d696",
"metadata": {},
"source": [
"We will begin by running multiple types of models with base parameters to see which run best with our dataset. "
"We will begin by running multiple types of models with base parameters to see which run best with our dataset. Because we are particularly interested in classifying negative tweets, we will optimize for recall, as we would prefer to wrongly classify positive tweets as negative than wrongly classify negative tweets as positive and potentially miss issues."
]
},
{
Expand Down Expand Up @@ -873,7 +874,7 @@
"id": "3a9c1ce6",
"metadata": {},
"source": [
"SVM SMOTE is the highest performing of the three, so I will use this algorithm wth all the models to ensure I select the best performing model."
"All three methods are very close -- I chose SVM SMOTE as the highest performing of the three because it has the most balanced recall performance."
]
},
{
Expand Down Expand Up @@ -1366,6 +1367,14 @@
"plt.show();"
]
},
{
"cell_type": "markdown",
"id": "8183c26c",
"metadata": {},
"source": [
"We see a lot of similar words: thank, best, great, awesome, amazing are words with positive sentiment, and hold, worst, delayed, cancelled, and rude are common words with negative sentiment."
]
},
{
"cell_type": "markdown",
"id": "53dadded",
Expand All @@ -1379,7 +1388,7 @@
"id": "d44395ba",
"metadata": {},
"source": [
"Multinomial Naive Bayes, XGBoost, and CatBoost are the three highest performing algorithms, so we will tune parameters for each of these algorithms to improve performance."
"Multinomial Naive Bayes and XGBoost are the two highest performing algorithms, so we will tune parameters for each of these algorithms to improve performance."
]
},
{
Expand Down Expand Up @@ -1532,7 +1541,7 @@
"id": "e63e232c",
"metadata": {},
"source": [
"Write some analysis"
"Interestingly, the original Naive Bayes model is the best performing, so we will implement that model into production."
]
},
{
Expand All @@ -1548,7 +1557,7 @@
"id": "b7cd46c3",
"metadata": {},
"source": [
"In order to put this model into production, we will need to build a function that collects tweets, processes them, and categorizes them."
"In order to put this model into production, we will need to build a function to processes and categorize a tweet."
]
},
{
Expand All @@ -1562,7 +1571,7 @@
" tweet = process(tweet)\n",
" tweet = [tweet]\n",
" tweet = vectorizer.transform(tweet)\n",
" print(xgb_gridsearch.best_estimator_.predict(tweet))"
" print(models_smote[2].predict(tweet))"
]
},
{
Expand All @@ -1588,12 +1597,21 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "406ffd6a",
"cell_type": "markdown",
"id": "07cbb482",
"metadata": {},
"outputs": [],
"source": []
"source": [
"## Next Steps"
]
},
{
"cell_type": "markdown",
"id": "be940254",
"metadata": {},
"source": [
"* Develop workflow for reviewing tweet sentiment\n",
"* Use sentiment analysis to improve United customer service."
]
}
],
"metadata": {
Expand Down

0 comments on commit 906e5b5

Please sign in to comment.