A Dash web app for binary classification model selection
A demo deployed to Heroku is available for viewing at the following address: https://classifier-dash-app.herokuapp.com/
The demo is not mobile friendly so please view on desktop/PC for full functionality.
This repository contains working code for deploying a binary classification model selection tool to a Dash app locally.
The inspiration for this tool came from Issue #10441 of the Yellowbrick project "to create an at-a-glance representation of multiple model scores so that I can easily compare and contrast different model instances." The heatmap below is my solution, albeit outside the scope of the Yellowbrick project itself given the use of Dash/Plotly instead of Matplotlib. Utilizing the interactivity of Dash/Plolty, I extended the solution to incorporate existing yellowbrick classification visualizations, named visualizers.
The web app consists of three components:
- A dropdown allowing the user to view models with training data either as-is or synthetically upsampled to address any class imbalance. The default is no upsampling. The upsample.py module within the utils directory can provide details on the upsampling method.
- A heatmap containing precision, recall, and f1 scores for each sklearn model along with the following:
- macro average: averaging the unweighted mean per label
- weighted average: averaging the support weighted mean per label2
- When hovering over associated row in heatmap for sklearn model, model-specific images of matplotlib plots will appear that were populated from utilizing Yellowbrick classification visualizers.3
- ROCAUC: Graphs the receiver operating characteristics and area under the curve.
- Precision-Recall Curves: Plots the precision and recall for different probability thresholds.
- Classification Report: A visual classification report that displays precision, recall, and F1 per-class as a heatmap.
- Confusion Matrix: A heatmap view of the confusion matrix of pairs of classes in multi-class classification.
The data used in the example is the 'default of credit card clients Data Set' from the UCI Machine Learning Repository.4 If you would like to use your own data then place the file in the Data/Input directory and provide the command line argument as noted below in the Instructions.
-
Review config.py file to select appropriate sklearn classifiers, yellowbrick visualizers, and filesystem structure for your needs
-
Input your data in the data input filepath directory with the target variable as the first column followed by the feature columns.
-
Run the following commands in the project's root directory to set up the data and images.
-
To create the yellowbrick classificaiton visualizer images and model scores output file named report_df.csv. Note that an input data filepath is needed as an argument i.e. credit.csv
python process_data.py credit.csv
-
-
To run the Dash Plotly web app.
`python app.py`
-
Go to http://127.0.0.1:8050/
After creating a virtual environment (recommended), you can install the dependencies with the following command:
pip install -r requirements.txt
1 DistrictDataLabs/yellowbrick#1044
2 https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
3 https://www.scikit-yb.org/en/latest/api/classifier/index.html
4 http://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients