Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

Merged
merged 5 commits into from
Mar 16, 2025

Conversation

janosh
Copy link
Owner

@janosh janosh commented Mar 16, 2025

Since its beginning, Matbench Discovery chose the F1 discovery score as the default metric by which models were ranked. Users could click other metric columns in the metrics table to sort by, but initial page load was always ranked by F1.

This PR defines a new combined performance score $\text{CPS} \in [0, 1]$ which currently is the weighted average of $50\% \text{ F1} + 40\% \ \kappa_\text{SRME} + 10\% \text{ RMSD}$ but will be extended to additional metrics in the future. Over the course of this leaderboard's evolution, the CPS metric is meant to combine the highest-signal metrics into a single number that best reflects a model's overall utility across different simulation tasks.
Being mindful of the fact that user needs and opinions of what matters in a good model differ widely across the community, we made it easy to dynamically adjust the weighting between different metrics with a single click. This allows the leaderboard to prioritize the metrics that best align with whatever simulation task a user has in mind.

Metric Combination Logic

$\kappa_\text{SRME}$ can take values in $[0, 2]$ where lower is better. It is normalized to the range $[0, 1]$ with higher = better before entering the weighting function above.

RMSD can take values in $[0, \infty)$. The current worst model for this metric gets $\text{RMSD} = 0.0227$. Models will hopefully get better, not worse in this metric so we pick $[0, 0.03]$ as the range from which to normalize RMSD values to between $[0, 1]$. That is, a perfect model achieving $\text{RMSD}=0$ would map to 1 before entering the CPS weighting function.

F1 score is already a normalized metric in the range $[0, 1]$ with higher = better and requires no normalization. It enters CPS directly.

For models where any of F1, $\kappa_\text{SRME}$ or RMSD are NaN, the CPS is also set to NaN which gets sorted to the bottom of the ranking by default. Energy-only models which don't offer geometry optimization and force predictions for phonon modeling are in this category but they were already ranked at the bottom based on the prior F1 ranking, so little changes here. MLFFs that can deliver all 3 metrics but haven't yet are invited to do so. Those are GNoME and MatterSim v1 5M.

The current ranking for all models now looks like this:

Screenshot 2025-03-16 at 7 55 18 PM

major changes:

  • Add RadarChart.svelte for visualizing metric weights and drag/drop knob to adjust weights
  • Add TableControls.svelte component for managing filters and column visibility
  • Add metrics.ts with new geometry optimization metrics and scaling functions for combining disparate metrics into a single score
  • Add vitest unit tests for all new components and new behavior in existing components

…chart to dynamically adjust metric weights

- add `RadarChart.svelte` for visualizing metric weights and drag/drop knob to adjust weights
- add `TableControls.svelte` component for managing filters and column visibility
- add `metrics.ts` with new geometry optimization metrics and scaling functions for combining disparate metrics into a single score
- add vitest unit tests for `MetricsTable`, `HeatmapTable`
@janosh janosh added analysis New model analysis site Website related labels Mar 16, 2025
janosh added 3 commits March 16, 2025 18:21
- Refine wording in the pull request template for clarity on model prediction file naming.
- Simplify the display property for the RMSD metric in metrics.ts.
- Added corresponding analysis files and URLs for both symmetry precision settings
@janosh janosh merged commit a48c373 into main Mar 16, 2025
7 of 8 checks passed
@janosh janosh deleted the combined-perf-score branch March 16, 2025 23:47
@janosh janosh changed the title Show new combined performance score (CPS) on landing page with radar chart to dynamically adjust metric weights Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis New model analysis site Website related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant