This project implements an SMS spam classifier using machine learning algorithms. The classifier uses the NLTK (Natural Language Toolkit) library to process text data and categorize SMS messages as either "spam" or "ham" (non-spam). The model is trained on a labeled dataset of SMS messages, and it uses various text preprocessing techniques such as tokenization, stopword removal, and TF-IDF vectorization for feature extraction.
- Text preprocessing with NLTK
- Tokenization, stopword removal, and stemming
- TF-IDF vectorization for feature extraction
- Model training using machine learning algorithms
- Evaluation of model performance with accuracy and other metrics
nltk
- Natural Language Toolkit for text processingscikit-learn
- For machine learning models and metricspandas
- Data manipulation and analysisnumpy
- For numerical operations
git clone https://github.com/sahankrt20/sms-spam-classifier.git