Welcome to my Machine Learning repository! This repository is a collection of my projects and experiments as I learn and explore various machine learning concepts and techniques. I will continuously update this repository with new projects, code, and insights as I progress in my machine learning journey. All the codes will primarily be in R.
This repository covers a wide range of machine learning methods, starting with classical linear and generalized linear models and progressing to modern non-linear statistical learning methods. Below is an overview of the topics that will be covered:
-
Classical Linear and Generalized Linear Models:
- Linear Regression
- Logistic Regression
- Linear Discriminant Analysis
-
Modern Resampling and Variable Selection Methods:
- Bootstrapping
- Shrinkage Methods (e.g., Lasso, Ridge Regression)
-
Traditional Multivariate Analysis Methods:
- Cluster Analysis
- Principal Component Analysis (PCA)
- Multivariate Regression Methods (e.g., Structural Equation Modeling)
-
Modern Non-Linear Statistical Learning Methods:
- Generalized Additive Models (GAM)
- Decision Trees
- Random Forest
- Boosting and Bagging
- Support Vector Machines (SVM)
- Neural Networks
The repository is organized into folders based on the type of machine learning task. Below is the directory structure:
└── sriramv1212-machine-learning/
├── README.md
├── Linear Regression/
│ ├── Ames_Housing_Data.csv
│ └── Linear Regression 1_ Housing price prediction.Rmd
└── Logistic Regression/
├── Logistic Regression 2- Titanic.Rmd
└── Titanic2.csv
- Description: This project focuses on predicting house prices using the Ames Housing dataset. Two variable selection methods—stepwise selection and best subset selection—are used to build and compare linear regression models.
- Files:
Ames_Housing_Data.csv
: Dataset containing housing features and sale prices.Linear Regression 1_ Housing price prediction.Rmd
: R Markdown file with the analysis, model building, and evaluation.
- Key Steps:
- Data splitting into training and testing sets.
- Stepwise variable selection based on BIC.
- Best subset selection based on SSE.
- Model comparison using BIC, RMSE, and R².
- Results:
- Stepwise model and best subset model coefficients.
- RMSE and R² values for both models on the test data.
- Comparison of model performance.
- Description: This project predicts passenger survival on the Titanic using logistic regression. The dataset is cleaned, and a logistic regression model is built to classify passengers as survivors or non-survivors.
- Files:
Titanic2.csv
: Dataset containing passenger information and survival status.Logistic Regression 2- Titanic.Rmd
: R Markdown file with the analysis, model building, and evaluation.
- Key Steps:
- Data cleaning and preprocessing.
- Splitting data into training and testing sets.
- Fitting a logistic regression model with 7 predictors.
- Evaluating the model using a confusion matrix, accuracy, sensitivity, and specificity.
- Predicting survival for additional passengers.
- Results:
- Confusion matrix and performance metrics (accuracy, sensitivity, specificity).
- Predicted survival for additional passengers.
Method | Status | Details |
---|---|---|
Linear Regression | ✅ Completed | Link to Project |
Logistic Regression | ✅ Completed | Link to Project |
Linear Discriminant Analysis | ⏳ Upcoming | Planned for future updates. |
Bootstrapping | ⏳ Upcoming | Planned for future updates. |
Shrinkage Methods | ⏳ Upcoming | Planned for future updates. |
Cluster Analysis | ⏳ Upcoming | Planned for future updates. |
Principal Component Analysis | ⏳ Upcoming | Planned for future updates. |
Decision Trees | ⏳ Upcoming | Planned for future updates. |
Random Forest | ⏳ Upcoming | Planned for future updates. |
Boosting and Bagging | ⏳ Upcoming | Planned for future updates. |
Support Vector Machines | ⏳ Upcoming | Planned for future updates. |
Neural Networks | ⏳ Upcoming | Planned for future updates. |
-
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (2008)
Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer.
Download PDF -
An Introduction to Statistical Learning with Applications in R (2017)
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. Springer.
Book Website -
Applied Multivariate Statistics with R (Optional) (2015)
Daniel Zelterman. Springer.
ISBN-13: 978-3319361635, ISBN-10: 3319361635.
-
Clone the Repository:
git clone https://github.com/sriramv1212/sriramv1212-machine-learning.git
-
Navigate to the Project Folder:
- For Linear Regression:
cd sriramv1212-machine-learning/Linear Regression
- For Logistic Regression:
cd sriramv1212-machine-learning/Logistic Regression
- For Linear Regression:
-
Open the R Markdown File:
- Use RStudio or any compatible IDE to open and run the
.Rmd
files.
- Use RStudio or any compatible IDE to open and run the
-
Reproduce the Analysis:
- Ensure all required libraries are installed.
- Run the code chunks in the R Markdown file to reproduce the analysis and results.
- R Libraries:
caret
MASS
leaps
dplyr
tidyverse
Install the required libraries using:
install.packages(c("caret", "MASS", "leaps", "dplyr", "tidyverse"))
- Sriram Vivek
- GitHub: sriramv1212
- Email: [email protected]
This project is licensed under the MIT License. See the LICENSE file for details.
- The Ames Housing dataset and Titanic dataset are publicly available and widely used for educational purposes.
- Special thanks to the R community for providing excellent resources and libraries for machine learning.
Happy Learning!