Multi-model metrics visualizer #1044

bbengfort · 2020-02-26T14:38:51Z

Describe the solution you'd like

I would like to create an at-a-glance representation of multiple model scores so that I can easily compare and contrast different model instances. This will be our first attempt handling multiple models in a visualizer - so could be tricky, and may require a new API. I envision something that creates a heatmap of metrics to models, sort of like the classification report, but where the rows are not classes but are instead are models.

I propose the code would look something like this:

viz = MultiModelMetrics([
    ("Naive Bayes", GaussianNB()),
    ("Neural Network", MultilayerPerceptron()),
    ("Logistic", LogisticRegression()),
    ("Boosting", GradientBoostingClassifier()), 
    ("Bagging", RandomForestClassifier()), 
], is_fitted=False, metrics="classification")

viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.show()

Like a pipeline, this API allows us to specify names for the estimator that will be visualized, or a list of visualizers can be added and the estimator name will be used.

Examples

A prototype example:

rebeccabilbro · 2020-02-26T15:39:09Z

Excited for this one!

navarretedaniel · 2020-03-10T11:22:42Z

I helped troubleshoot the code to create the prototype shown above. Kudos to @lmschreib for initially proposing the concept! Agree that it'd be great to see a version of this incorporated into Yellowbrick!

I'm taking another look at it and am in the process of making additional modifications in order to see how it can be improved and to compare different visual layouts. Here's an alternative way to display the multi-model metrics via a heatmap:

I'll keep plugging away at it (feel free to assign to me)!

bbengfort · 2020-03-10T12:26:55Z

@navarretedaniel nice!

taylorplumer · 2020-05-25T18:42:35Z

For anyone comfortable or interested with Dash/Plotly,

I was inspired by this issue and incorporated some of the functionality described above (thanks @bbengfort and @navarretedaniel for posting!) within a Dash web application (Github repo).

When hovering over the associated row for the sklearn model in the heatmap, Yellowbrick classification visualizers will display i.e.

I know that this is outside the scope of the Yellowbrick project itself given the use of Dash/Plotly instead of Matplotlib. Thought I would share in case if anyone is interested in applying the above for the interim to binary classification problems.

bbengfort · 2020-06-10T13:18:29Z

@taylorplumer your dashboard looks amazing! We've often discussed an application form of Yellowbrick that could do multi-visualizer displays and it seems like Dash/Plotly might be a good approach. I think this would be a great blog post if you were interested; alternatively, we'd be happy to post a description of how to create this dashboard in our documentation if you were interested in writing one up.

taylorplumer · 2020-06-19T05:04:19Z

@bbengfort thanks for the feedback! I was planning on creating a "how to" blog post for the dashboard this weekend so think that works. If there is anything in particular to note that'd be beneficial to the Yellowbrick community, then I'd be happy to incorporate it.

bbengfort · 2020-06-20T17:24:52Z

@taylorplumer that would be great, looking forward to reading your post! It would be great if you could DM us on Twitter or here on GitHub so we can share the link to your post. In terms of notes, the party line here is that visual diagnostics are critical to effective machine learning and I think your dashboard will definitely communicate that!

kalkite · 2021-11-01T08:52:22Z

@navarretedaniel @taylorplumer

Hello, could you please give an example for rfecv as well, I would like to plot multiple model rfecv curve.

https://www.scikit-yb.org/en/latest/api/model_selection/rfecv.html,

rebeccabilbro · 2021-11-01T17:46:03Z

Hi @rajeshkalakoti -- apologies for the slow response, but I've just responded to your related feature request in #1203. To summarize, recursive feature elimination is already performing multimodel fitting in a sense (by removing a feature and refitting the Estimator each time). That means it's already slow for some use cases, especially with datasets containing many features, though this is something we are working on!

Could you help us understand better your use case for multimodel RFECV?

kalkite · 2021-11-02T13:02:09Z

I mean, I would like to plot the mulitple classifier on the same graph as it's shown on the picture. could it be possible?

I plotted this graph with cross valscore accuracy manually.

rebeccabilbro · 2021-11-03T17:44:05Z

Thank you for providing an illustration @rajeshkalakoti! What stage of the model selection process do you tend to use these plots for? In other words, do you use the above plot to select kNN as the best model, since it achieves >.98 accuracy with fewer features than the Decision Tree and Random Forest models? Or to rule out kNN, given its extreme sensitivity to certain n_feature counts? Or is the plot instead designed to steer the feature selection process, e.g. by demonstrating that it should be possible to dramatically reduce the original feature space of the data regardless of the algorithm that will be used?

As I mentioned previously, this kind of plot is not currently possible in Yellowbrick. While it would be difficult to implement (due to the limitations of long-running feature elimination sequences), it might be feasible if we could parallelize the model fitting of each classifier (and even within a single classifier) with joblib and an n_jobs argument. What kind of latency do you feel would be tolerable for users for producing these kinds of plots? Would this be a feature you would be interested in working on as a contributor to Yellowbrick?

bbengfort added the type: feature a new visualizer or utility for yb label Feb 26, 2020

bbengfort assigned navarretedaniel Apr 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-model metrics visualizer #1044

Multi-model metrics visualizer #1044

bbengfort commented Feb 26, 2020

rebeccabilbro commented Feb 26, 2020

navarretedaniel commented Mar 10, 2020

bbengfort commented Mar 10, 2020

taylorplumer commented May 25, 2020 •

edited

Loading

bbengfort commented Jun 10, 2020

taylorplumer commented Jun 19, 2020

bbengfort commented Jun 20, 2020

kalkite commented Nov 1, 2021 •

edited

Loading

rebeccabilbro commented Nov 1, 2021

kalkite commented Nov 2, 2021 •

edited

Loading

rebeccabilbro commented Nov 3, 2021

Multi-model metrics visualizer #1044

Multi-model metrics visualizer #1044

Comments

bbengfort commented Feb 26, 2020

rebeccabilbro commented Feb 26, 2020

navarretedaniel commented Mar 10, 2020

bbengfort commented Mar 10, 2020

taylorplumer commented May 25, 2020 • edited Loading

bbengfort commented Jun 10, 2020

taylorplumer commented Jun 19, 2020

bbengfort commented Jun 20, 2020

kalkite commented Nov 1, 2021 • edited Loading

rebeccabilbro commented Nov 1, 2021

kalkite commented Nov 2, 2021 • edited Loading

rebeccabilbro commented Nov 3, 2021

taylorplumer commented May 25, 2020 •

edited

Loading

kalkite commented Nov 1, 2021 •

edited

Loading

kalkite commented Nov 2, 2021 •

edited

Loading