Created a Linear Regression ML Project to analysis a data set of various pieces of personal information and predicting the costs of charges for medical costs.
Figure 1: Scatter plot with linear fit to show positive correlation between BMI and Medical Costs
Figure 2: Scatter plot with linear fit to show positive correlation between Age and Medical Costs
Figure 3: Correlatoin Matrix to show the strength of correlation between the various continuous variables.
Figure 4: Violin plot to show Medical Cost distributions based on sex.
Figure 5: Violin plot to show Medical Cost distributions based on smoking vs. non-smoking.
Figure 6: Violin plot to show Medical Cost distributions by region, split by smoking and non-smoking.
Figure 7: Violin plot to show Medical Cost distribution by region, split by sex.
Figure 8: Visualization of our check for linearity between the actual vs predicted values.
Figure 9: Visualization of our check for residual normality & mean (residual error).
Figure 10: Visualization of our check for multivariant normality.
Figure 11: Visualization of our check for homoscedasticity.
insurance.csv: Our data that we are using in this analysis project.
Insurance ML Analysis.py: Python file that includes all the code used to produce figures 1-11 as well as our ML model.