Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset problem: Automatic fixing for errors in AERmon cmor data? #2613

Closed
MartineMichou opened this issue Dec 10, 2024 · 24 comments · Fixed by #2634
Closed

Dataset problem: Automatic fixing for errors in AERmon cmor data? #2613

MartineMichou opened this issue Dec 10, 2024 · 24 comments · Fixed by #2634
Assignees
Labels
fix for dataset Related to dataset-specific fix files

Comments

@MartineMichou
Copy link

MartineMichou commented Dec 10, 2024

Hello,
I work with yaml recipes and when
I run the recipe copied below it terminates with an error for all of the
CMIP6 AERmon datasets listed in the recipe that are:

  • CESM2 pb air_pressure
    - {dataset: CESM2, grid: gn, institute: NCAR, mip: AERmon, ensemble: r1i1p1f1, exp: amip, project: CMIP6 , alias: CESM2, start_year: 1979, end_year: 2014}

  • CESM2-WACCM pb air_pressure
    - {dataset: CESM2-WACCM , grid: gn, institute: NCAR,mip: AERmon, ensemble: r1i1p1f1, exp: amip, project: CMIP6 , alias: CESM2-WACCM, start_year: 1979, end_year: 2014}

  • MPI-ESM-1-2-HAM pb found 2 air pressure
    - {dataset: MPI-ESM-1-2-HAM, grid: gn, institute: HAMMOZ-Consortium, mip: AERmon, ensemble: r1i1p1f1, exp: amip, project: CMIP6 , alias: MPI-ESM, start_year: 1979, end_year: 1998}

  • EC-Earth-Consortium pb found 2 air pressure
    - {dataset: EC-Earth3-AerChem, grid: gn, institute: EC-Earth-Consortium, mip: AERmon, ensemble: r1i1p1f1, exp: amip, project: CMIP6 , alias: EC-Earth, start_year: 1979, end_year: 1979}

I have the following questions:

  • I guess if there would be ESMValCore "CMOR fixing scripts" for the
    above dataset they would be automatically applyed, or if not what
    are the commands lines in the yml recipe to apply then ?

  • if "CMOR fixing scripts" do not exist already for the above datasets
    , I would be grateful
    for developements so the community can deal with these AERmon
    datasets. For a number of chemical fields, CMIP6 model outputs exist
    only for this AERmon table

All the best,
Martine Michou

-----
# ESMValTool
---
documentation:
  title: Ozone test diags
  description: >
    Test recipe for simple ozone diagnostics..
  authors:
    - lauer_axel
  maintainer:
    - lauer_axel

# les années ci dessous sont celles pour la variable oh pas terrible...    
datasets: 
     # CESM2 pb air_pressure
#  - {dataset:   CESM2, grid: gn, institute: NCAR, mip: AERmon,
# ensemble: r1i1p1f1, exp: amip, project: CMIP6 , alias: CESM2, start_year: 1979, end_year: 2014}
# CESM2-WACCM pb air_pressure
#  - {dataset:  CESM2-WACCM , grid: gn, institute: NCAR,mip: AERmon,
# ensemble: r1i1p1f1,   exp: amip, project: CMIP6 , alias: CESM2-WACCM, start_year: 1979, end_year: 2014}
# MPI-ESM-1-2-HAM pb found 2 air pressure
 # - {dataset:      MPI-ESM-1-2-HAM, grid: gn,  institute:
  #  HAMMOZ-Consortium, mip: AERmon, ensemble: r1i1p1f1,   exp: amip, project: CMIP6 , alias: MPI-ESM, start_year: 1979, end_year: 1998}                                                  
 #  EC-Earth-Consortium pb found 2 air pressure
  - {dataset:      EC-Earth3-AerChem, grid: gn,  institute:    EC-Earth-Consortium,  mip: AERmon, ensemble: r1i1p1f1,   exp: amip, project: CMIP6 , alias: EC-Earth,  start_year: 1979, end_year: 1979}


preprocessors:
  pp_timeseries1000:
    custom_order: true  # makes preprocessor much faster since input for extract_levels is smaller
    regrid:
      target_grid: 2x2
      scheme: linear
    extract_levels:
      levels: 100000
      scheme: linear
      coordinate: air_pressure
    extract_region:
      start_latitude: -90
      end_latitude: 90
      start_longitude: 0
      end_longitude: 360      
    area_statistics:
      operator: mean   



diagnostics:

  oh_timeseries1000:
    description: Plot times series oh at 1000 hPA
    variables:
      oh:
        preprocessor: pp_timeseries1000
    scripts:
      plot:
        script: /home/michou/ESMValTool_2.10cnrm/Recipes_tested/From_ALauer/multi_datasets.py
        plot_folder: '{plot_dir}'
        plot_filename: '{plot_type}_{real_name}_{dataset}_{mip}'
        plots:
          timeseries:
            

@valeriupredoi valeriupredoi added the fix for dataset Related to dataset-specific fix files label Dec 12, 2024
@valeriupredoi
Copy link
Contributor

@MartineMichou could you please post the full Traceback/error? Cheers 🍺

@MartineMichou
Copy link
Author

MartineMichou commented Dec 13, 2024 via email

@valeriupredoi
Copy link
Contributor

thanks a lot @MartineMichou 🍺

As I see it: we have 2x air_pressure coords, and DU can't change to units m. Will have a closer look, most prob in the new year 😁 🎄

AP-twice_main_log_debug.txt
DU_meters_main_log_debug.txt

@schlunma
Copy link
Contributor

The conversion of DU to m and vice versa will be available in v2.12.0, see #2509 and #2560.

@valeriupredoi
Copy link
Contributor

indeed, that sounded familiar 😁 @MartineMichou afraid today is my last day until January 6, so I can't do much about these fixes, but we'll get onto them next year 😁 🎄

@MartineMichou
Copy link
Author

MartineMichou commented Dec 20, 2024

Hello Valeriu,

Thanks for the update. Have a nice holiday season!

Dealing with AERmon CMIP6 data for the four models CESM2,
CESM2-WACCM, MPI-ESM-1-2-HAM, EC-Earth3-AerChem result in other
errors when I run the recipe copied below. I run this recipe
dealing with one dataset at a time.

I attach the error messages. If it can help.

We'll get in touch in January.
All the best,
Martine

# ESMValTool
---
documentation:
  title: Ozone test diags
  description: >
    Test recipe for simple ozone diagnostics..
  authors:
    - lauer_axel
  maintainer:
    - lauer_axel

datasets: 
#   - {dataset:   CESM2, grid: gn, institute: NCAR, start_year: 1980,
#   end_year: 1980}
#   - {dataset:  CESM2-WACCM , grid: gn, institute: NCAR, start_year: 1980,
#   end_year: 1980}
#   - {dataset:      MPI-ESM-1-2-HAM, grid: gn,  institute:
#     HAMMOZ-Consortium, start_year: 1980,
 #  end_year: 1980}                                                  
   - {dataset:      EC-Earth3-AerChem, grid: gn,  institute:
     EC-Earth-Consortium, start_year: 1979,
   end_year: 1979}

preprocessors:
  zonal_mean:
    custom_order: true  # makes preprocessor much faster since input for extract_levels is smaller
    climate_statistics:
      period: full
    regrid:
      target_grid: 2x2
      scheme: linear
    extract_levels:
      levels: {cmor_table: CMIP6, coordinate: plev19}
      scheme: linear
      coordinate: air_pressure
    zonal_statistics:
      operator: mean


diagnostics:

  oh_zonalmean:
    description: Plot ozone zonal mean profiles.
    variables:
      oh:
        mip: AERmon
        ensemble: r1i1p1f1
        exp: amip
        project: CMIP6       
        preprocessor: zonal_mean
    scripts:
      plot:
        script: /home/michou/ESMValTool_2.10cnrm/Recipes_tested/From_ALauer/multi_datasets.py
        plot_folder: '{plot_dir}'
        plot_filename: '{plot_type}_{real_name}_{dataset}_{mip}'
        plots:
          zonal_mean_profile:
            common_cbar: true

main_log_debug1.txt
main_log_debug2.txt
main_log_debug3.txt
main_log_debug4.txt

@valeriupredoi
Copy link
Contributor

thanks a lot @MartineMichou - and sorry for the delay!

I am able to reproduce (at least one of your) errors you see )am sure I can get the others too. The multiple coordinated with standard name air_pressure is as follows: the oh variable has two aux coords:

    Auxiliary coordinates:
        air_pressure                                                                              x                                                -             x              x
            history='2020-08-31T13:07:56Z altered by CMOR: Reordered dimensions, original...
        vertical coordinate formula term: ap                                                      -                                                x             -              -
        vertical coordinate formula term: b                                                       -                                                x             -              -
    Derived coordinates:
        air_pressure                                                                              x                                                x             x              x

-> an air pressure, and a derived air pressure obtained with formula terms, and they both have air_pressure as standard name. I am unfortunately not sure how to tackle this - my take would be to remove the initial aux coord and keep the (CMOR-corrected) derived coord as more precise, @schlunma @sloosvel any advice here would be very welcome 🍺

@schlunma
Copy link
Contributor

my take would be to remove the initial aux coord and keep the (CMOR-corrected) derived coord as more precise, @schlunma @sloosvel any advice here would be very welcome 🍺

Sounds good!

@valeriupredoi
Copy link
Contributor

thanks Manu! I'll pop a fix in tomorrow then, unless @sloosvel tells me not to - I vaguely remember Karl Taylor of CMOR saying we should not care about these corrections, but they don't harm the data if they are already in

@bouweandela
Copy link
Member

From V's debugging, it looks like one of the derived coordinate parameters may have the wrong standard name. The correct standard names are listed here: https://cfconventions.org/cf-conventions/cf-conventions.html#parametric-v-coord. I have not been able to find a single file with correct parametric vertical coordinates in CMIP6 so far, somehow these are very difficult to get right. See #2454 for some related work.

@bouweandela
Copy link
Member

@MartineMichou We have instructions available on how to fix issues with the input data here: https://docs.esmvaltool.org/projects/ESMValCore/en/latest/develop/fixing_data.html. That would require installing ESMValCore from source, as described here: https://docs.esmvaltool.org/en/latest/quickstart/installation.html#using-the-development-version-of-the-esmvalcore-package. Would you be willing to have a go at fixing the issue yourself?

@MartineMichou
Copy link
Author

@valeriupredoi
Dear Valeriu, thanks for proposing to develop a fix for the problem you identified. Looking forward to hearing from you. Best Martine

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jan 16, 2025

From V's debugging, it looks like one of the derived coordinate parameters may have the wrong standard name. The correct standard names are listed here: https://cfconventions.org/cf-conventions/cf-conventions.html#parametric-v-coord. I have not been able to find a single file with correct parametric vertical coordinates in CMIP6 so far, somehow these are very difficult to get right. See #2454 for some related work.

we have these buggers:

  • aux_coord standard name: 'air_pressure', long name: 'Surface Air Pressure', name: air_pressure, shape(12, 90, 120) -> this is an Aux Coord, and has been corrected for by CMOR
  • aux_coord standard name: 'air_pressure', long name: None, name: air_pressure, shape(12, 34, 90, 120) -> this is a Derived Coord, and has been derived with the formula term

Now I am getting even more confused - which one to keep, which one to rename?

Full cords here, tuples of (s.standard_name, s.long_name, s.name(), s) where is the coord:

[
('time', 'time', 'time', <DimCoord: time / (days since 1960-01-01 00:00:00)  [...]+bounds  shape(12,)>), ('atmosphere_hybrid_sigma_pressure_coordinate', 'hybrid sigma pressure coordinate', 'atmosphere_hybrid_sigma_pressure_coordinate', <DimCoord: atmosphere_hybrid_sigma_pressure_coordinate / (1)  [0.997, ...]+bounds  shape(34,)>),
('latitude', 'Latitude', 'latitude', <DimCoord: latitude / (degrees)  [-89., -87., ..., 87., 89.]+bounds  shape(90,)>),
('longitude', 'Longitude', 'longitude', <DimCoord: longitude / (degrees)  [ 1.5, 4.5, ..., 355.5, 358.5]+bounds  shape(120,)>), ('air_pressure', 'Surface Air Pressure', 'air_pressure', <AuxCoord: air_pressure / (Pa)  <lazy>  shape(12, 90, 120)>),
(None, 'vertical coordinate formula term: ap', 'vertical coordinate formula term: ap', <AuxCoord: vertical coordinate formula term: ap / (Pa)  [3.288, ...]  shape(34,)>),
(None, 'vertical coordinate formula term: b', 'vertical coordinate formula term: b', <AuxCoord: vertical coordinate formula term: b / (1)  [0.997, ...]  shape(34,)>),
('air_pressure', None, 'air_pressure', <AuxCoord: air_pressure / (Pa)  <lazy>  shape(12, 34, 90, 120)>)
]

@schlunma
Copy link
Contributor

schlunma commented Jan 16, 2025

Just set standard_name=None standard_name=surface_air_pressure for the the Surface Air Pressure coordinate (the aux coordinate). This is used as input to derive the 4D air pressure (this is the one listed as "derived coordinate"; see here for details).

@valeriupredoi
Copy link
Contributor

just had a chat with our local CMIP data guru @davidhassell - this file needs two fixes:

  • change standard name change for Surface Air Pressure
  • nuking the Derived coord (the one that gets computed with the formula terms - that should not be there)

So Bouwe was right about the standard name, and also me wanting to remove that Derived coord is also safe. Manu, your suggestion is good, but let's do things even better 😁

@schlunma
Copy link
Contributor

schlunma commented Jan 16, 2025

Sorry, I have to disagree:

  • Surface Air Pressure is certainly not a valid standard name.
  • The CMOR table of this variable specifies the vertical coordinate alevel (see table), which always corresponds to a derived variable (see formula terms, e.g., here). From the ncdump above, this should be a hybrid sigma pressure coordinate. So in short: this derived coordinate is expected and should be present!

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jan 16, 2025

ah good points, Manu! But the file I have ie CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-AerChem/amip/r1i1p1f1/AERmon/oh/gn/v20200910/oh_AERmon_EC-Earth3-AerChem_amip_r1i1p1f1_gn_197901-197912.nc does not have the Derived coordinate when I do a ncdump -h on it - confusion reinstates itself 😁

EDIT: all the output above is obtained by loading the file into iris
EDIT2: Surface Air Pressure is long_name, not standard_name, the standard_name for that is the generic air_pressure

@schlunma
Copy link
Contributor

But the file I have ie CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-AerChem/amip/r1i1p1f1/AERmon/oh/gn/v20200910/oh_AERmon_EC-Earth3-AerChem_amip_r1i1p1f1_gn_197901-197912.nc does not have the Derived coordinate when I do a ncdump -h on it - confusion reinstates itself 😁

Yes, that's ok. The derived coordinate is added by iris when loading this file from the variables ap, b, and ps using the HybridPressureFactory. Note the word factory.

I checked other files, and it seems that they just don't use a standard name for ps, e.g. CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/Amon/cl/gn/v20190710/cl_Amon_MPI-ESM1-2-LR_historical_r1i1p1f1_gn_185001-186912.nc on Levante (I cannot find the one that you linked).

@schlunma
Copy link
Contributor

There's also extensive text about this in the CF conventions: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#parametric-vertical-coordinate.

@valeriupredoi
Copy link
Contributor

OK so what's the battle plan for the file fix then? It's odd to me that iris is operating on a file, then goes around complaining about something it did 😆

@schlunma
Copy link
Contributor

schlunma commented Jan 16, 2025

I am 99% sure that setting cube.coord(long_name="Surface Air Pressure").standard_name = None cube.coord(long_name="Surface Air Pressure").standard_name = "surface_air_pressure" in a fix will solve this. The error appears in the extract levels preprocessor (way after the fixes).

@valeriupredoi
Copy link
Contributor

I am 99% sure that setting cube.coord(long_name="Surface Air Pressure").standard_name = None in a fix will solve this. The error appears in the extract levels preprocessor (way after the fixes).

perfect! Lemme plug that in a PR then 🍻

@valeriupredoi
Copy link
Contributor

@MartineMichou here is the fix for oh in EC-Earth3-AerChem #2634
-> have a look at the way I added functional code and tests, if you find any other variables that need fixing, you can just follow that practical example and implement the fixes yourself, and we'll review and approve 😊

@MartineMichou
Copy link
Author

Dear all, many thanks for your time and for fixing the issue. Martine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix for dataset Related to dataset-specific fix files
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants