This project performs a comprehensive analysis of Netflix data and implements a content-based recommendation system called FLIX-HUB. It includes data preprocessing, exploratory data analysis, advanced feature engineering, and a movie/TV show recommendation engine.
- Data cleaning and preprocessing
- Exploratory data analysis with interactive visualizations
- Text processing and feature engineering
- Content-based recommendation system
- Support for both movies and TV shows
To run this project, you need to have Python installed on your system. Then, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/Recommendation-System .git
-
Navigate to the project directory:
cd movie-recommendation-system.ipynb
-
Install the required packages:
pip install -r requirements.txt
To use the FLIX-HUB recommendation system:
- Run the Jupyter notebook or Python script.
- Use the
FlixHub
class to get recommendations:
flix_hub = FlixHub(final_data, cosine_sim)
movies, tv_shows = flix_hub.recommendation('Movie Title', total_result=10, threshold=0.5)
print('Similar Movie(s) list:')
for movie in movies:
print(movie)
print('\nSimilar TV_show(s) list:')
for tv_show in tv_shows:
print(tv_show)
The data preprocessing steps include:
- Loading the Netflix dataset
- Handling missing values
- Cleaning text data (titles, descriptions, etc.)
- Creating a bag of words representation
The EDA process includes various visualizations:
- Distribution of content types (movies vs. TV shows)
- Number of movies released each year
- Top countries producing Netflix content
- Movie ratings distribution
- Word clouds for titles, descriptions, and genres
Advanced feature engineering techniques are applied:
- Text cleaning and normalization
- TF-IDF vectorization
- Cosine similarity calculation
The FLIX-HUB recommendation system uses:
- Content-based filtering
- Cosine similarity for finding similar content
- Separate recommendations for movies and TV shows
The project provides insights into Netflix's content library and offers personalized recommendations based on user input. Some key findings include:
- Distribution of movies vs. TV shows
- Trends in content production over the years
- Popular genres and themes
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License - see the LICENSE file for details.