Capstone Project for Data Science Course
Detection of car crash severity with the background:
- Growth interest in AI systems, especially Autonomous Driving
- Prediction by governments to automate driving systems (BCG, 2015)
- Safety is still intricate issue, therefore:
- Observe variables that took part in car crash severity from real-life data
- Comparison of best machine learning methods to classify severity
- Model is also attractive to build an alert system for city infrastructure workers (paramedics, police, firefighter, etc)
- Seattle GeoData. The data is an open data from the Seattle government, collected from 2004-2020. (Collisions_OD).
- Data specifications: 40 attributes, 221,144 collection of accidents. Severity indicator:
- 3: Fatality — High Probability
- 2b: Serious Injury — Mild Probability
- 2: Injury — Low Probability
- 1: Property Damage — Very Low Probability
- 0: Unknown — Little to No Probability
- Language : Python
Libraries:
- matplotlib :3.2.1
- pandas : 1.0.5
- scikit-learn : 0.22.2
- numpy : 1.18.5
- jupyter notebook : 5.2.2
- Data cleaning
- Fill NaN values
- Variable correlation
Three models are evaluated:
- K-Nearest Neighbor
- Decision Tree
- Logistic Regression All models are previously searched in the space of [1,20] for k, [1,15] for depth and [0.001,0.01,0.1,1,10,100] for regression in logistic regression.
Evaluation is done using 5 metrics: Jaccard-index, F1-score, LogLoss, Precision, Recall