Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New preprocessor: Distance metrics dataset vs. reference #2266

Closed
schlunma opened this issue Nov 30, 2023 · 8 comments · Fixed by #2299
Closed

New preprocessor: Distance metrics dataset vs. reference #2266

schlunma opened this issue Nov 30, 2023 · 8 comments · Fixed by #2299
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@schlunma
Copy link
Contributor

schlunma commented Nov 30, 2023

Is your feature request related to a problem? Please describe.

To facilitate the comparison of different models against each other or against obersvations, it would be really helpful to have a preprocessor that calculates distance metrics of two datasets. Example for distance metrics can be the RMSE or the Pearson correlation coefficient.

I am thinking of the following signature for the new preprocessor:

def distance_metric(
    products: set[PreprocessorFile] | Iterable[Cube],
    metric: MetricType,
    reference: Optional[Cube] = None,
    coords: Iterable[Coord] | Iterable[str] | None = None,
    keep_reference_dataset: bool = True,
    **kwargs,
) -> set[PreprocessorFile] | CubeList:

When used within a recipe, reference datasets should be marked with reference_for_metric: true. This signature also allows an easy usage of the preprocessor outside of ESMValTool

Would you be able to help out?

Yes!

@axel-lauer @hb326 @LisaBock

@schlunma schlunma added the enhancement New feature or request label Nov 30, 2023
@schlunma schlunma changed the title New preprocessor. New preprocessor: Distance metrics dataset vs. reference Nov 30, 2023
@schlunma schlunma added this to the v2.11.0 milestone Nov 30, 2023
@schlunma schlunma self-assigned this Nov 30, 2023
@valeriupredoi
Copy link
Contributor

yes I think @bouweandela plopped a Pearson test in an example recipe once. And I remember I barked at the poor guy that we should probably look at a Kolmogorov-Smirnoff distance plot better than a Pearson one 😁

@schlunma
Copy link
Contributor Author

Draft PR is available here: #2299

@bouweandela
Copy link
Member

Would reference instead of ref_cube work as a name for the reference cube/dataset? Avoiding abbreviations makes things easier to understand for newcomers (who, e.g. when using just a recipe, may not even know what a cube is).

@schlunma
Copy link
Contributor Author

schlunma commented Feb 5, 2024

Would reference instead of ref_cube work as a name for the reference cube/dataset? Avoiding abbreviations makes things easier to understand for newcomers (who, e.g. when using just a recipe, may not even know what a cube is).

I actually like having the expected type of the variable in its name (here: cube). Also, this argument is not necessary in the recipe, but only if you want to use that function outside of ESMValTool in a Python script or notebook. Users that do that should be (more) experienced Python users, so I would expect them to be capable of reading API docs, which explain the variable in detail. Thus, would reference_cube instead of ref_cube work for you?

@bouweandela
Copy link
Member

But what if at some point in the future we would like to make it possible that the reference could be something else too? Using the type in the name was quite popular at some point (just Google Hungarian notation), not it's not considered good practice anymore, mostly because it makes it difficult to change code over time.

@schlunma
Copy link
Contributor Author

Well, then it seems that almost all of our preprocessor functions use that bad practice (most use the argument cube). In general, I agree that Hungarian notation is problematic, but calling a variable ref_cube can IMHO hardly be considered as following the Hungarian notation principles.

Anyway, will change this.

@bouweandela
Copy link
Member

Well, then it seems that almost all of our preprocessor functions use that bad practice (most use the argument cube)

I know, I couldn't think of anything better at the time.

@schlunma
Copy link
Contributor Author

Renamed in 58dd3c5. I also renamed the corresponding argument of the bias preprocessor in b36af01. Since this hasn't been released yet, we don't need a deprecation cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants