Skip to content

Commit

Permalink
Fix multiprocessing and consolidate QC (#68)
Browse files Browse the repository at this point in the history
* move multiprocessing out of for loop

* add qc_config and job array to qc sbatch

* add print statement to track file names/times

* use actual variable count in sbatch params

* Combine qc script and notebook and simplify code

* drop refs to visual qc for runner script

* make qc scripts and notebook consistent

* small fixes for regridding qc

* remove unused args in qc runner

* pull subsampling code into qc module

* checkpoint for script to combine regridded data for rasdaman

* finalize script to combine regridded files for rasdaman

* fix regridding batch files script to handle MPI-M institution ID

* add empty variables if missing

* remove rasdaman preprocessing script for monthly common cmip6

* remove unused dict from regridding config

* print job_id for prefect ssh to parse

* try command in place of conda_init_script

* print job IDs for regrid runner

* print list of job ids as space-separated string

* drop crop from target dataset

* ensure lon dim is 1D when sorting

* disable tryexcept for regrid call

* check for latlon dims before fixing

* add interp method as top level parameter

* fix interp_method top level parameter

* add missing kwarg

* fix positional arg

* fix script arg

* fix script arg

* print regrid qc slurm job id

* fix regrid qc sbatch script

* drop ref to error file

* regridding qc overhaul for generic target grid

* clean up regrid qc nb

* drop bnds variables first in rasdafy

* fix longitude shift for 0-360 src files

* Add fixed frequency variables to transfers pipeline (#70)

* stop skipping fx and Ofx frequencies when generating batch files

* generate new batch files with fx and Ofx frequencies / new fixed frequency variables

* use "1950" placeholder start year and end year values in grid dict if time dimension is missing from dataset

* explicitly skip fx, Ofx, orog variables in regridding time correction functions

* look for sftlf, sftof var names instead of freqs in filename

* + documentation

* transfer E3SM fixed frequency files

* re-run e3sm holdings, fix messaging in generate_manifest.py and update the manifest; start to add specific additional files to config

* add one-off files to config and generate new manifest; add "piControl" experiment to transfer path in batch file generation

* generate batch files

* import missing argparse module

* import missing sys module

* import missing os and upath modules

* rm upath module actually, not available in cmip6-utils env

* remove unused code

---------

Co-authored-by: kyleredilla <[email protected]>

* add soil temp variable

* add check for lat var before trnaposing

* add soil temperature to transfers pipeline

* str tweak for fixed freq vars in regrid script

* add rsus, rlus, and mlotst to transfers config

* add landsea mask functionality to regrid script

* drop unneeded regrid script arg

* add sftlf arg to regrid slurm script

* add no files error to regrid batch gen script

* add x and y as dims in regrid batch file generation

* add trycatch for sftlf lookup in regrid job generation

* increase tolerance for native nanmask

* tweak qc script for land/sea variables

* transpose plots and add error handling for file opening for regrid qc

* handle use of full lat/lon names in coordinate vars by some models

* fix duplicate nanmin in regrid batch file gen

* add conversion of snw variable

* expand nan fraction thresholds for using native file nanmask

* add ignore_degenerate arg to regridder

* makde regridding qc subsetting more robust for nonstandard grids

* update regridding config

* remove Omon prsn from transfers

* add script to remove old versions of duplicate raw cmip6 files

* updates to improve regridding qc

* add function to drop empty directories in cleanout script

* remove prsn as landsea variable 🤦

* use nan funcs for regrid qc plot comparisons

* clean up regrid qc code and fix up docs

* clean up regridding pipeline including docs

* typo fix

* improve time dim encoding in regrid files

* fix units and dims in plotting in regrid qc

* update readme

* bump regridding QC time limit to 2 hours

* tweaks to fix regridding for 360 day calendars

* add missing code for rasdafy arg

* de-duplicate function name

* fix regrid bounding box fetch for qc and add buffer for file min max

* fix regrid to source bbox conversion in qc

* improve regrid qc plotting

* add missing arg in regrid qc function

* remove holdings output text files

---------

Co-authored-by: Joshdpaul <[email protected]>
Co-authored-by: Josh Paul <[email protected]>
  • Loading branch information
3 people authored Feb 19, 2025
1 parent 4b93923 commit 63dad3d
Show file tree
Hide file tree
Showing 85 changed files with 10,485 additions and 50,342 deletions.
4 changes: 3 additions & 1 deletion regridding/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Regrid CMIP6 on ACDN

This directory is used for regridding the CMIP6 data mirrored on the ACDN to a common grid. The target grid is the NCAR CESM2 grid, which is also shared by a number of the other chosen models.
This directory is used for regridding the CMIP6 data mirrored on the Arctic Climate Data Node. This pipeline will regrid the specified set of models, scenarios, variables, and frequencies (e.g. temporal resolutions) to the grid of some specified file (referred to as the target grid file). This includes cropping the extent of regridded outputs to match the extent of the target file if it is of a larger extent.

Note - this pipeline was previously used for a single fixed grid - that used by the NCAR CESM2 model, which is also shared by a few of the other chosen models (TaiESM1, NorESM2-MM).

This pipeline also crops these datasets to a pan-arctic domain of 50N - 90N.

Expand Down
54 changes: 30 additions & 24 deletions regridding/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Lookup tables for CMIP6 regridding.
Since we are providing our config via Prefect, these were copied from the transfers/config.py file
to avoid using system environment variables."""
to avoid using system environment variables.
"""

# batch file naming template
batch_tmp_fn = "batch_{model}_{scenario}_{frequency}_{var_id}_{grid_name}_{count}.txt"
Expand All @@ -14,23 +15,6 @@
"ssp585",
]

# institution model strings (<institution>_<model>, from mirrored data) that we will be regridding
inst_models = [
"NOAA-GFDL_GFDL-ESM4",
"NIMS-KMA_KACE-1-0-G",
"CNRM-CERFACS_CNRM-CM6-1-HR",
"NCC_NorESM2-MM",
"AS-RCEC_TaiESM1",
"MOHC_HadGEM3-GC31-MM",
"MOHC_HadGEM3-GC31-LL",
"MIROC_MIROC6",
"EC-Earth-Consortium_EC-Earth3-Veg",
"NCAR_CESM2",
"MPI-M_MPI-ESM1-2-HR",
"DKRZ_MPI-ESM1-2-HR",
"MRI_MRI-ESM2-0",
]

model_inst_lu = {
"ACCESS-CM2": "CSIRO-ARCCSS",
"CESM2": "NCAR",
Expand All @@ -46,11 +30,8 @@
"TaiESM1": "AS-RCEC",
"CESM2-WACCM": "NCAR",
# Another oddity - MPI-ESM1-2-* models have different representation among the institutions, or "Institution ID".
# the -HR version is apparently mostly available under "DKRZ". The -LR version is mostly available under "MPI-M".
# There is apparently mixing, too, as the -HR version has historical data under "MPI-M", and the -LR version has
# data available under "DKRZ". We will just go with the institution which has the majority for each, for now.
# the -HR version was run by "DKRZ" for ScenarioMIP data and MPI-M for CMIP experiment
"MPI-ESM1-2-HR": "DKRZ",
"MPI-ESM1-2-LR": "MPI-M",
}

variables = {
Expand Down Expand Up @@ -93,8 +74,8 @@
},
"prsn": {
"name": "snowfall_flux",
"table_ids": ["Amon", "Omon", "day"],
}, # some models use Omon for table ID
"table_ids": ["Amon", "day"],
},
"snd": {"name": "surface_snow_thickness", "table_ids": ["LImon", "Eday"]},
"snw": {"name": "surface_snow_amount", "table_ids": ["LImon", "day"]},
"rlds": {
Expand Down Expand Up @@ -134,3 +115,28 @@
"table_ids": ["Amon", "day", "Eday"],
},
}

landsea_variables = {
"mrro": "land",
"mrsos": "land",
"mrsol": "land",
"snd": "land",
"snw": "land",
"siconc": "sea",
}

# lookup for the sftlf file paths for each model, hardcoded paths for now
model_sftlf_lu = {
"GFDL-ESM4": "/beegfs/CMIP6/arctic-cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp370/r1i1p1f1/fx/sftlf/gr1/v20180701/sftlf_fx_GFDL-ESM4_ssp370_r1i1p1f1_gr1.nc",
"CNRM-CM6-1-HR": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/fx/sftlf/gr/v20191021/sftlf_fx_CNRM-CM6-1-HR_historical_r1i1p1f2_gr.nc",
"NorESM2-MM": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/NCC/NorESM2-MM/historical/r1i1p1f1/fx/sftlf/gn/v20191108/sftlf_fx_NorESM2-MM_historical_r1i1p1f1_gn.nc",
"TaiESM1": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/fx/sftlf/gn/v20200624/sftlf_fx_TaiESM1_historical_r1i1p1f1_gn.nc",
"HadGEM3-GC31-MM": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/piControl/r1i1p1f1/fx/sftlf/gn/v20200108/sftlf_fx_HadGEM3-GC31-MM_piControl_r1i1p1f1_gn.nc",
"HadGEM3-GC31-LL": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/piControl/r1i1p1f1/fx/sftlf/gn/v20190709/sftlf_fx_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn.nc",
"MIROC6": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/fx/sftlf/gn/v20190311/sftlf_fx_MIROC6_historical_r1i1p1f1_gn.nc",
"EC-Earth3-Veg": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/historical/r1i1p1f1/fx/sftlf/gr/v20211207/sftlf_fx_EC-Earth3-Veg_historical_r1i1p1f1_gr.nc",
"CESM2": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r11i1p1f1/fx/sftlf/gn/v20190514/sftlf_fx_CESM2_historical_r11i1p1f1_gn.nc",
"MPI-ESM1-2-HR": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/historical/r1i1p1f1/fx/sftlf/gn/v20190710/sftlf_fx_MPI-ESM1-2-HR_historical_r1i1p1f1_gn.nc",
"MRI-ESM2-0": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historical/r1i1p1f1/fx/sftlf/gn/v20190603/sftlf_fx_MRI-ESM2-0_historical_r1i1p1f1_gn.nc",
# no sftlf files for E3SM models or KACE-1-0-G
}
Loading

0 comments on commit 63dad3d

Please sign in to comment.