Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preprocessors distance_metrics and histogram #2299

Merged
merged 68 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
5d50924
Added distance_metrics preprocessor (RMSE)
schlunma Jan 12, 2024
7b402f2
Make distance_metric usable in recipe
schlunma Jan 12, 2024
a8e13ea
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Jan 23, 2024
9609655
Renamed _bias.py -> _compare_with_refs.py
schlunma Jan 24, 2024
a8d1ac0
Added weighted distance metrics
schlunma Jan 24, 2024
b60963c
flake8
schlunma Jan 24, 2024
45b5d77
Removed prints
schlunma Jan 25, 2024
ab7da9c
Make sure that dtype is preserved and added tests for masked data
schlunma Jan 25, 2024
6f37ccb
Added doc and allowed arbitrary kwargs for distance_metric
schlunma Jan 25, 2024
9a8481b
Added pearson r
schlunma Jan 25, 2024
012d308
Remove print
schlunma Jan 25, 2024
c8138ce
Fixed doc build
schlunma Jan 25, 2024
52bb0c3
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Feb 6, 2024
30a576a
Simplify doc
schlunma Feb 6, 2024
e260846
Added first working version of EMD calculation
schlunma Feb 6, 2024
73c51ad
Implemented lazy EMD
schlunma Feb 7, 2024
2a0bf29
Fixed test
schlunma Feb 7, 2024
55ffb12
Add detailed descriptions of all metrics
schlunma Feb 7, 2024
17019b0
More detailed description of EMD
schlunma Feb 8, 2024
d60ba86
Fixed bug in EMD calculation for masked input
schlunma Feb 8, 2024
e3828bd
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Feb 8, 2024
494f328
Added tests for EMD calculation
schlunma Feb 8, 2024
3713a7b
Test distance_metric settings early when running recipes
schlunma Feb 8, 2024
2b2b43e
Fix mypy
schlunma Feb 8, 2024
1b9f9d7
Added tests for fully masked data
schlunma Feb 8, 2024
a6989d6
Optimize formula for EMD
schlunma Feb 15, 2024
58dd3c5
ref_cube -> reference in distance_metrics
schlunma Feb 15, 2024
b36af01
ref_cube -> reference in bias preproc
schlunma Feb 15, 2024
65535e4
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Feb 15, 2024
0d9f4b2
Fix typo in doc
schlunma Feb 15, 2024
c67d66b
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Feb 23, 2024
60834e7
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Mar 1, 2024
b825426
Avoid potential memory leak
schlunma Mar 1, 2024
a27c116
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Mar 27, 2024
5823b1d
Fixed units of EMD
schlunma Mar 27, 2024
8a22f5a
Make rechunk_cube work with any coords
schlunma Mar 27, 2024
6f0e2a0
Added histogram preprocessor
schlunma Mar 27, 2024
60597b7
Fix flake8
schlunma Mar 27, 2024
cc7982f
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 3, 2024
478ae11
Use correct metadata for histogram cube and coordinates
schlunma Apr 3, 2024
f052bc4
Add test for fully masked data
schlunma Apr 3, 2024
8360274
Add test for coords=time
schlunma Apr 3, 2024
d6f6ac1
Moved get_weights to _other module
schlunma Apr 3, 2024
3ff677c
Support weighted histograms
schlunma Apr 3, 2024
ca15464
Add doc
schlunma Apr 3, 2024
b5edc66
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 4, 2024
6968e80
Nicer way of setting up histogram cube
schlunma Apr 5, 2024
07807c1
Fix tests
schlunma Apr 5, 2024
be33379
Fix codacy and codecov
schlunma Apr 5, 2024
ae70e53
Use histogram() in EMD calculation
schlunma Apr 8, 2024
1f8c927
Added weighted EMD
schlunma Apr 8, 2024
85652fb
Added histogram to list of preproc
schlunma Apr 8, 2024
ffb4480
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 9, 2024
17a7d1c
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 16, 2024
2f58f8f
Fix doc build
schlunma Apr 16, 2024
4d6ae63
Merge branch 'main' into distance_metric_preproc
schlunma Apr 16, 2024
8cf2cf6
Use common_mask=True for pearsonr
schlunma Apr 17, 2024
9d2c206
Fixed typo
schlunma Apr 17, 2024
49df2ba
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 26, 2024
9a2a2e2
Moved shared preprocessor functions to _shared module
schlunma Apr 26, 2024
10e99a0
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma Apr 26, 2024
14c7fb4
100% coverage
schlunma Apr 26, 2024
37f06f8
Proper dtype handling
schlunma Apr 29, 2024
28004fb
Merge branch 'main' into distance_metric_preproc
schlunma Apr 30, 2024
c1d645b
Merge branch 'main' into distance_metric_preproc
schlunma May 2, 2024
3c3dad2
Update doc/recipe/preprocessor.rst
schlunma May 3, 2024
cbb6680
Apply suggestions from code review
schlunma May 3, 2024
5bdb89b
Merge remote-tracking branch 'origin/main' into distance_metric_preproc
schlunma May 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
298 changes: 274 additions & 24 deletions doc/recipe/preprocessor.rst

Large diffs are not rendered by default.

74 changes: 57 additions & 17 deletions esmvalcore/_recipe/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import logging
import os
import subprocess
from functools import partial
from pprint import pformat
from shutil import which
from typing import Any, Iterable
Expand Down Expand Up @@ -395,47 +396,86 @@ def differing_timeranges(timeranges, required_vars):
"Set `timerange` to a common value.")


def bias_type(settings: dict) -> None:
"""Check that bias_type for bias preprocessor is valid."""
if 'bias' not in settings:
def _check_literal(
settings: dict,
*,
step: str,
option: str,
allowed_values: tuple[str],
) -> None:
"""Check that an option for a preprocessor has a valid value."""
if step not in settings:
return
valid_options = ('absolute', 'relative')
user_bias_type = settings['bias'].get('bias_type', 'absolute')
if user_bias_type not in valid_options:
user_value = settings[step].get(option, allowed_values[0])
if user_value not in allowed_values:
raise RecipeError(
f"Expected one of {valid_options} for `bias_type`, got "
f"'{user_bias_type}'"
f"Expected one of {allowed_values} for `{option}`, got "
f"'{user_value}'"
)


def reference_for_bias_preproc(products):
"""Check that exactly one reference dataset for bias preproc is given."""
step = 'bias'
bias_type = partial(
_check_literal,
step='bias',
option='bias_type',
allowed_values=('absolute', 'relative'),
)


metric_type = partial(
_check_literal,
step='distance_metric',
option='metric',
allowed_values=(
'rmse',
'weighted_rmse',
'pearsonr',
'weighted_pearsonr',
'emd',
'weighted_emd',
),
)


def _check_ref_attributes(products: set, *, step: str, attr_name: str) -> None:
"""Check that exactly one reference dataset is given."""
products = {p for p in products if step in p.settings}
if not products:
return

# Check that exactly one dataset contains the facet ``reference_for_bias:
# true``
# Check that exactly one dataset contains the specified facet
reference_products = []
for product in products:
if product.attributes.get('reference_for_bias', False):
if product.attributes.get(attr_name, False):
reference_products.append(product)
if len(reference_products) != 1:
products_str = [p.filename for p in products]
if not reference_products:
ref_products_str = ". "
else:
ref_products_str = [p.filename for p in reference_products]
ref_products_str = f":\n{pformat(ref_products_str)}.\n"
ref_products_str = (
f":\n{pformat([p.filename for p in reference_products])}.\n"
)
raise RecipeError(
f"Expected exactly 1 dataset with 'reference_for_bias: true' in "
f"Expected exactly 1 dataset with '{attr_name}: true' in "
f"products\n{pformat(products_str)},\nfound "
f"{len(reference_products):d}{ref_products_str}Please also "
f"ensure that the reference dataset is not excluded with the "
f"'exclude' option")


reference_for_bias_preproc = partial(
_check_ref_attributes, step='bias', attr_name='reference_for_bias'
)


reference_for_distance_metric_preproc = partial(
_check_ref_attributes,
step='distance_metric',
attr_name='reference_for_metric',
)


def statistics_preprocessors(settings: dict) -> None:
"""Check options of statistics preprocessors."""
mm_stats = (
Expand Down
4 changes: 3 additions & 1 deletion esmvalcore/_recipe/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@
)
from esmvalcore.preprocessor._area import _update_shapefile_path
from esmvalcore.preprocessor._multimodel import _get_stat_identifier
from esmvalcore.preprocessor._other import _group_products
from esmvalcore.preprocessor._regrid import (
_spec_to_latlonvals,
get_cmor_levels,
get_reference_levels,
parse_cell_spec,
)
from esmvalcore.preprocessor._shared import _group_products

from . import check
from .from_datasets import datasets_to_recipe
Expand Down Expand Up @@ -555,6 +555,7 @@ def _get_preprocessor_products(
f'{separator.join(sorted(missing_vars))}')

check.reference_for_bias_preproc(products)
check.reference_for_distance_metric_preproc(products)

_configure_multi_product_preprocessor(
products=products,
Expand Down Expand Up @@ -656,6 +657,7 @@ def _update_preproc_functions(settings, dataset, datasets, missing_vars):
check.statistics_preprocessors(settings)
check.regridding_schemes(settings)
check.bias_type(settings)
check.metric_type(settings)


def _get_preprocessor_task(datasets, profiles, task_name):
Expand Down
12 changes: 3 additions & 9 deletions esmvalcore/iris_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def rechunk_cube(
Input cube.
complete_coords:
(Names of) coordinates along which the output cubes should not be
chunked. The given coordinates must span exactly 1 dimension.
chunked.
remaining_dims:
Chunksize of the remaining dimensions.

Expand All @@ -248,17 +248,11 @@ def rechunk_cube(
"""
cube = cube.copy() # do not modify input cube

# Make sure that complete_coords span exactly 1 dimension
complete_dims = []
for coord in complete_coords:
coord = cube.coord(coord)
dims = cube.coord_dims(coord)
if len(dims) != 1:
raise CoordinateMultiDimError(
f"Complete coordinates must be 1D coordinates, got "
f"{len(dims):d}D coordinate '{coord.name()}'"
)
complete_dims.append(dims[0])
complete_dims.extend(cube.coord_dims(coord))
complete_dims = list(set(complete_dims))

# Rechunk data
if cube.has_lazy_data():
Expand Down
10 changes: 7 additions & 3 deletions esmvalcore/preprocessor/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
meridional_statistics,
zonal_statistics,
)
from ._bias import bias
from ._compare_with_refs import bias, distance_metric
from ._cycles import amplitude
from ._derive import derive
from ._detrend import detrend
Expand All @@ -46,7 +46,7 @@
mask_outside_range,
)
from ._multimodel import ensemble_statistics, multi_model_statistics
from ._other import clip
from ._other import clip, histogram
from ._regrid import (
extract_coordinate_points,
extract_levels,
Expand Down Expand Up @@ -175,12 +175,15 @@
'linear_trend_stderr',
# Convert units
'convert_units',
# Histograms
'histogram',
# Ensemble statistics
'ensemble_statistics',
# Multi model statistics
'multi_model_statistics',
# Bias calculation
# Comparison with reference datasets
'bias',
'distance_metric',
# Remove supplementary variables from cube
'remove_supplementary_variables',
# Save to file
Expand Down Expand Up @@ -215,6 +218,7 @@

MULTI_MODEL_FUNCTIONS = {
'bias',
'distance_metric',
'ensemble_statistics',
'multi_model_statistics',
'mask_multimodel',
Expand Down
Loading