Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

janosh · 2025-03-16T20:58:49Z

Since its beginning, Matbench Discovery chose the F1 discovery score as the default metric by which models were ranked. Users could click other metric columns in the metrics table to sort by, but initial page load was always ranked by F1.

This PR defines a new combined performance score $\text{CPS} \in [0, 1]$ which currently is the weighted average of $50\% \text{ F1} + 40\% \ \kappa_\text{SRME} + 10\% \text{ RMSD}$ but will be extended to additional metrics in the future. Over the course of this leaderboard's evolution, the CPS metric is meant to combine the highest-signal metrics into a single number that best reflects a model's overall utility across different simulation tasks.
Being mindful of the fact that user needs and opinions of what matters in a good model differ widely across the community, we made it easy to dynamically adjust the weighting between different metrics with a single click. This allows the leaderboard to prioritize the metrics that best align with whatever simulation task a user has in mind.

Metric Combination Logic

$\kappa_\text{SRME}$ can take values in $[0, 2]$ where lower is better. It is normalized to the range $[0, 1]$ with higher = better before entering the weighting function above.

RMSD can take values in $[0, \infty)$. The current worst model for this metric gets $\text{RMSD} = 0.0227$. Models will hopefully get better, not worse in this metric so we pick $[0, 0.03]$ as the range from which to normalize RMSD values to between $[0, 1]$. That is, a perfect model achieving $\text{RMSD}=0$ would map to 1 before entering the CPS weighting function.

F1 score is already a normalized metric in the range $[0, 1]$ with higher = better and requires no normalization. It enters CPS directly.

For models where any of F1, $\kappa_\text{SRME}$ or RMSD are NaN, the CPS is also set to NaN which gets sorted to the bottom of the ranking by default. Energy-only models which don't offer geometry optimization and force predictions for phonon modeling are in this category but they were already ranked at the bottom based on the prior F1 ranking, so little changes here. MLFFs that can deliver all 3 metrics but haven't yet are invited to do so. Those are GNoME and MatterSim v1 5M.

The current ranking for all models now looks like this:

major changes:

Add RadarChart.svelte for visualizing metric weights and drag/drop knob to adjust weights
Add TableControls.svelte component for managing filters and column visibility
Add metrics.ts with new geometry optimization metrics and scaling functions for combining disparate metrics into a single score
Add vitest unit tests for all new components and new behavior in existing components

…chart to dynamically adjust metric weights - add `RadarChart.svelte` for visualizing metric weights and drag/drop knob to adjust weights - add `TableControls.svelte` component for managing filters and column visibility - add `metrics.ts` with new geometry optimization metrics and scaling functions for combining disparate metrics into a single score - add vitest unit tests for `MetricsTable`, `HeatmapTable`

- Refine wording in the pull request template for clarity on model prediction file naming. - Simplify the display property for the RMSD metric in metrics.ts.

…just using TableColumnToggleMenu.svelte

- Added corresponding analysis files and URLs for both symmetry precision settings

janosh temporarily deployed to github-pages March 16, 2025 21:02 — with GitHub Actions Inactive

janosh added analysis New model analysis site Website related labels Mar 16, 2025

janosh added 3 commits March 16, 2025 18:21

fix missing RMSD label in RadarChart.svelte

e64634c

- Refine wording in the pull request template for clarity on model prediction file naming. - Simplify the display property for the RMSD metric in metrics.ts.

fix TableControls.svelte rolling its own column toggle UI instead of …

bde63bd

…just using TableColumnToggleMenu.svelte

add geo_opt metrics for SevenNet-MF-ompa

5737e00

- Added corresponding analysis files and URLs for both symmetry precision settings

janosh temporarily deployed to github-pages March 16, 2025 23:23 — with GitHub Actions Inactive

rename type MetricWeight.display to label

a100673

janosh temporarily deployed to github-pages March 16, 2025 23:41 — with GitHub Actions Inactive

janosh merged commit a48c373 into main Mar 16, 2025
7 of 8 checks passed

janosh deleted the combined-perf-score branch March 16, 2025 23:47

janosh changed the title ~~Show new combined performance score (CPS) on landing page with radar chart to dynamically adjust metric weights~~ Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights Mar 16, 2025

janosh mentioned this pull request Mar 16, 2025

Add: 7net-mf-ompa, 7net-omat MDIL-SNU/SevenNet#184

Merged

Asecretboy mentioned this pull request Mar 17, 2025

Regarding the normalization range of RMSD #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

janosh commented Mar 16, 2025 •

edited

Loading

Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

Sort metrics table by combined performance score (CPS) with radar chart to dynamically adjust metric weights #223

Conversation

janosh commented Mar 16, 2025 • edited Loading

Metric Combination Logic

janosh commented Mar 16, 2025 •

edited

Loading