Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-model metrics visualizer #1044

Open
bbengfort opened this issue Feb 26, 2020 · 11 comments
Open

Multi-model metrics visualizer #1044

bbengfort opened this issue Feb 26, 2020 · 11 comments
Assignees
Labels
type: feature a new visualizer or utility for yb

Comments

@bbengfort
Copy link
Member

Describe the solution you'd like

I would like to create an at-a-glance representation of multiple model scores so that I can easily compare and contrast different model instances. This will be our first attempt handling multiple models in a visualizer - so could be tricky, and may require a new API. I envision something that creates a heatmap of metrics to models, sort of like the classification report, but where the rows are not classes but are instead are models.

I propose the code would look something like this:

viz = MultiModelMetrics([
    ("Naive Bayes", GaussianNB()),
    ("Neural Network", MultilayerPerceptron()),
    ("Logistic", LogisticRegression()),
    ("Boosting", GradientBoostingClassifier()), 
    ("Bagging", RandomForestClassifier()), 
], is_fitted=False, metrics="classification")

viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.show()

Like a pipeline, this API allows us to specify names for the estimator that will be visualized, or a list of visualizers can be added and the estimator name will be used.

Examples

A prototype example:

multimodelscores

@bbengfort bbengfort added the type: feature a new visualizer or utility for yb label Feb 26, 2020
@rebeccabilbro
Copy link
Member

Excited for this one!

@navarretedaniel
Copy link
Contributor

I helped troubleshoot the code to create the prototype shown above. Kudos to @lmschreib for initially proposing the concept! Agree that it'd be great to see a version of this incorporated into Yellowbrick!

I'm taking another look at it and am in the process of making additional modifications in order to see how it can be improved and to compare different visual layouts. Here's an alternative way to display the multi-model metrics via a heatmap:

MultiModelMetrics_Example

I'll keep plugging away at it (feel free to assign to me)!

@bbengfort
Copy link
Member Author

@navarretedaniel nice!

@taylorplumer
Copy link

taylorplumer commented May 25, 2020

For anyone comfortable or interested with Dash/Plotly,

I was inspired by this issue and incorporated some of the functionality described above (thanks @bbengfort and @navarretedaniel for posting!) within a Dash web application (Github repo).

When hovering over the associated row for the sklearn model in the heatmap, Yellowbrick classification visualizers will display i.e.

classifier-dash-app_screenshot

I know that this is outside the scope of the Yellowbrick project itself given the use of Dash/Plotly instead of Matplotlib. Thought I would share in case if anyone is interested in applying the above for the interim to binary classification problems.

@bbengfort
Copy link
Member Author

@taylorplumer your dashboard looks amazing! We've often discussed an application form of Yellowbrick that could do multi-visualizer displays and it seems like Dash/Plotly might be a good approach. I think this would be a great blog post if you were interested; alternatively, we'd be happy to post a description of how to create this dashboard in our documentation if you were interested in writing one up.

@taylorplumer
Copy link

@bbengfort thanks for the feedback! I was planning on creating a "how to" blog post for the dashboard this weekend so think that works. If there is anything in particular to note that'd be beneficial to the Yellowbrick community, then I'd be happy to incorporate it.

@bbengfort
Copy link
Member Author

@taylorplumer that would be great, looking forward to reading your post! It would be great if you could DM us on Twitter or here on GitHub so we can share the link to your post. In terms of notes, the party line here is that visual diagnostics are critical to effective machine learning and I think your dashboard will definitely communicate that!

@kalkite
Copy link

kalkite commented Nov 1, 2021

@navarretedaniel @taylorplumer

Hello, could you please give an example for rfecv as well, I would like to plot multiple model rfecv curve.
refecv
https://www.scikit-yb.org/en/latest/api/model_selection/rfecv.html,

@rebeccabilbro
Copy link
Member

Hi @rajeshkalakoti -- apologies for the slow response, but I've just responded to your related feature request in #1203. To summarize, recursive feature elimination is already performing multimodel fitting in a sense (by removing a feature and refitting the Estimator each time). That means it's already slow for some use cases, especially with datasets containing many features, though this is something we are working on!

Could you help us understand better your use case for multimodel RFECV?

@kalkite
Copy link

kalkite commented Nov 2, 2021

I mean, I would like to plot the mulitple classifier on the same graph as it's shown on the picture. could it be possible?
MicrosoftTeams-image

I plotted this graph with cross valscore accuracy manually.

@rebeccabilbro
Copy link
Member

Thank you for providing an illustration @rajeshkalakoti! What stage of the model selection process do you tend to use these plots for? In other words, do you use the above plot to select kNN as the best model, since it achieves >.98 accuracy with fewer features than the Decision Tree and Random Forest models? Or to rule out kNN, given its extreme sensitivity to certain n_feature counts? Or is the plot instead designed to steer the feature selection process, e.g. by demonstrating that it should be possible to dramatically reduce the original feature space of the data regardless of the algorithm that will be used?

As I mentioned previously, this kind of plot is not currently possible in Yellowbrick. While it would be difficult to implement (due to the limitations of long-running feature elimination sequences), it might be feasible if we could parallelize the model fitting of each classifier (and even within a single classifier) with joblib and an n_jobs argument. What kind of latency do you feel would be tolerable for users for producing these kinds of plots? Would this be a feature you would be interested in working on as a contributor to Yellowbrick?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature a new visualizer or utility for yb
Projects
None yet
Development

No branches or pull requests

5 participants