Skip to content

Latest commit

 

History

History
190 lines (140 loc) · 7.74 KB

README.md

File metadata and controls

190 lines (140 loc) · 7.74 KB

Machine Learning Projects

Welcome to my Machine Learning repository! This repository is a collection of my projects and experiments as I learn and explore various machine learning concepts and techniques. I will continuously update this repository with new projects, code, and insights as I progress in my machine learning journey. All the codes will primarily be in R.


Introduction and Overview

This repository covers a wide range of machine learning methods, starting with classical linear and generalized linear models and progressing to modern non-linear statistical learning methods. Below is an overview of the topics that will be covered:

  1. Classical Linear and Generalized Linear Models:

    • Linear Regression
    • Logistic Regression
    • Linear Discriminant Analysis
  2. Modern Resampling and Variable Selection Methods:

    • Bootstrapping
    • Shrinkage Methods (e.g., Lasso, Ridge Regression)
  3. Traditional Multivariate Analysis Methods:

    • Cluster Analysis
    • Principal Component Analysis (PCA)
    • Multivariate Regression Methods (e.g., Structural Equation Modeling)
  4. Modern Non-Linear Statistical Learning Methods:

    • Generalized Additive Models (GAM)
    • Decision Trees
    • Random Forest
    • Boosting and Bagging
    • Support Vector Machines (SVM)
    • Neural Networks

Repository Structure

The repository is organized into folders based on the type of machine learning task. Below is the directory structure:

└── sriramv1212-machine-learning/
    ├── README.md
    ├── Linear Regression/
    │   ├── Ames_Housing_Data.csv
    │   └── Linear Regression 1_ Housing price prediction.Rmd
    └── Logistic Regression/
        ├── Logistic Regression 2- Titanic.Rmd
        └── Titanic2.csv

Projects

1. Linear Regression: Housing Price Prediction

  • Description: This project focuses on predicting house prices using the Ames Housing dataset. Two variable selection methods—stepwise selection and best subset selection—are used to build and compare linear regression models.
  • Files:
  • Key Steps:
    • Data splitting into training and testing sets.
    • Stepwise variable selection based on BIC.
    • Best subset selection based on SSE.
    • Model comparison using BIC, RMSE, and R².
  • Results:
    • Stepwise model and best subset model coefficients.
    • RMSE and R² values for both models on the test data.
    • Comparison of model performance.

2. Logistic Regression: Titanic Survival Prediction

  • Description: This project predicts passenger survival on the Titanic using logistic regression. The dataset is cleaned, and a logistic regression model is built to classify passengers as survivors or non-survivors.
  • Files:
  • Key Steps:
    • Data cleaning and preprocessing.
    • Splitting data into training and testing sets.
    • Fitting a logistic regression model with 7 predictors.
    • Evaluating the model using a confusion matrix, accuracy, sensitivity, and specificity.
    • Predicting survival for additional passengers.
  • Results:
    • Confusion matrix and performance metrics (accuracy, sensitivity, specificity).
    • Predicted survival for additional passengers.

Status of Machine Learning Methods

Method Status Details
Linear Regression ✅ Completed Link to Project
Logistic Regression ✅ Completed Link to Project
Linear Discriminant Analysis ⏳ Upcoming Planned for future updates.
Bootstrapping ⏳ Upcoming Planned for future updates.
Shrinkage Methods ⏳ Upcoming Planned for future updates.
Cluster Analysis ⏳ Upcoming Planned for future updates.
Principal Component Analysis ⏳ Upcoming Planned for future updates.
Decision Trees ⏳ Upcoming Planned for future updates.
Random Forest ⏳ Upcoming Planned for future updates.
Boosting and Bagging ⏳ Upcoming Planned for future updates.
Support Vector Machines ⏳ Upcoming Planned for future updates.
Neural Networks ⏳ Upcoming Planned for future updates.

Recommended Textbooks

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (2008)
    Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer.
    Download PDF

  2. An Introduction to Statistical Learning with Applications in R (2017)
    Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. Springer.
    Book Website

  3. Applied Multivariate Statistics with R (Optional) (2015)
    Daniel Zelterman. Springer.
    ISBN-13: 978-3319361635, ISBN-10: 3319361635.


How to Use This Repository

  1. Clone the Repository:

    git clone https://github.com/sriramv1212/sriramv1212-machine-learning.git
  2. Navigate to the Project Folder:

    • For Linear Regression:
      cd sriramv1212-machine-learning/Linear Regression
    • For Logistic Regression:
      cd sriramv1212-machine-learning/Logistic Regression
  3. Open the R Markdown File:

    • Use RStudio or any compatible IDE to open and run the .Rmd files.
  4. Reproduce the Analysis:

    • Ensure all required libraries are installed.
    • Run the code chunks in the R Markdown file to reproduce the analysis and results.

Dependencies

  • R Libraries:
    • caret
    • MASS
    • leaps
    • dplyr
    • tidyverse

Install the required libraries using:

install.packages(c("caret", "MASS", "leaps", "dplyr", "tidyverse"))

Author


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • The Ames Housing dataset and Titanic dataset are publicly available and widely used for educational purposes.
  • Special thanks to the R community for providing excellent resources and libraries for machine learning.

Happy Learning!