Authors: Rayhaan Rasheed, Solomon Mekonnen, Sam Aboagye
Date: 11/26/2018
The data used in this project is the Heart Disease Dataset generated by Robert Detrano, M.D., Ph.D. at the V.A. Medical Center, Long Beach and Cleveland Clinic Foundation. The full database was pulled from the Machine Learning Repository created by the University of California Irvine.
There are 76 attributes in total, but the literature prefers using only 14 of them:
- Age
- Sex
- Chest Pain Rating
- Resting Blood Pressure
- Serum Cholestoral in mg/dl
- Blood Sugar Level While Fasting
- Resting EKG
- Maximum Heart Rate Achieved
- Exercise Induced Angina
- ST Depression Induced by Exercise Relative to Rest
- Slope of the Peak Exercise ST Segment
- Number of Major Vessels Colored by Flourosopy
- HR Type (Normal, Fixed Defect, or Reversible Defect)
- Class(target)
Link to the Data: https://github.com/rrasheed/Heart_Disease_ML1/blob/master/Heart.csv
This project aims to evaluate and compare different classifers using the Heart Disease database. Instead on turning this into a multi-class classification problem, the target values will be changed to either 0 or 1. Any target value that has a value greater or equal to 1 will be a 1 in the new target column; likewise, anything with a 0 will stay 0. The reason for this is to focus on the fundamental issue of whether a patient qualifies for having any sign or heart disease (OnevsAll)
The classifiers used: Logistic Regression, Random Forest, & Support Vector Machine
[1] Emelia, B. J., MD, ScM, FAHA. et. al (2018). Correction to: Heart Disease and Stroke Statistics—2018 Update: A Report From the American Heart Association. Circulation, 137(12), 67-492. doi:10.1161/cir.0000000000000573
[2] Whitley, D., & Watson, J. P. (1970). Complexity Theory and the No Free Lunch Theorem. Search Methodologies, 317-339. doi:10.1007/0-387-28356-0_11
[3] Bethel, G. B., Rajinikanth, T., PhD, & Raju, S. V., PhD. (2016). A Knowledge driven Approach for Efficient Analysis of Heart Disease Dataset. International Journal of Computer Applications, 147(9), 39-46. doi:10.5120/ijca2016911187
[4] Detrano R (1988), Heart Disease Data Set, V.A. Medical Center, Long Beach, and Cleveland Clinic Foundation, Retrieved from https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
We would like to acknowledge Dr. Yuxio Huang whose code was very useful in building and evaluating the classifiers