Welcome to the "Regression with an Insurance Dataset" repository! This project is my submission for the Kaggle competition, where I tackled the challenge of predicting insurance costs using a complex dataset. After a rollercoaster of data wrangling and model optimization, I achieved a score of 1.139 on the RMSLE metric.
This repository contains the code and resources used to preprocess the data, build the model, and evaluate its performance. Here's a quick overview of what you'll find:
- Handling Missing Values: Addressed the numerous missing values in the dataset to ensure data integrity.
- Encoding Categorical Data: Transformed categorical variables into a format suitable for model training.
- Extracting Date-Time Features: Broke down date-time data into individual features for better analysis.
- Scaling Numeric Values: Applied scaling techniques to maintain consistency across numeric data.
- Outlier Management: Identified and managed outliers using visualization techniques (a task that tested my patience!).
- Model Selection: Opted for a Feedforward Neural Network (FNN) to capture the complex patterns in the data.
- Model Architecture: Designed a model with an input layer, four hidden layers, and an output layer, incorporating batch normalization, dropout, regularization, EarlyStopping, and ReduceLROnPlateau.
- Model Evaluation: Thoroughly assessed the model's performance to ensure no overfitting or underfitting.
- Test Data Preparation: Prepared the test data for the final prediction submission.
To get started with this project, clone the repository and follow the instructions in the notebook.ipynb
file. You'll find detailed comments and explanations throughout the code to guide you through the process.
To download the dataset, use the following Kaggle API command:
kaggle competitions download -c playground-series-s4e12
- Python 3.x
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, tensorflow, keras
Feel free to fork this repository and submit pull requests if you have any improvements or suggestions. Let's make this project even better together!
This project is licensed under the MIT License. See the LICENSE
file for more details.