Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiprocessing and consolidate QC #68

Merged
merged 80 commits into from
Feb 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
82eaee6
move multiprocessing out of for loop
Joshdpaul Aug 30, 2024
f879224
add qc_config and job array to qc sbatch
Joshdpaul Aug 31, 2024
f92c051
add print statement to track file names/times
Joshdpaul Aug 31, 2024
0c1a6b7
use actual variable count in sbatch params
Joshdpaul Sep 3, 2024
7bf8b11
Combine qc script and notebook and simplify code
kyleredilla Sep 9, 2024
5896faa
drop refs to visual qc for runner script
kyleredilla Sep 9, 2024
67f9f9f
make qc scripts and notebook consistent
kyleredilla Sep 9, 2024
db5ce3d
small fixes for regridding qc
kyleredilla Sep 12, 2024
8a80efe
remove unused args in qc runner
kyleredilla Sep 12, 2024
bdc7ef3
pull subsampling code into qc module
kyleredilla Sep 13, 2024
60dc1b4
checkpoint for script to combine regridded data for rasdaman
kyleredilla Sep 14, 2024
1aadf51
finalize script to combine regridded files for rasdaman
kyleredilla Sep 18, 2024
f09963a
fix regridding batch files script to handle MPI-M institution ID
kyleredilla Sep 20, 2024
6001ac5
add empty variables if missing
kyleredilla Sep 20, 2024
9b90485
remove rasdaman preprocessing script for monthly common cmip6
kyleredilla Oct 1, 2024
0cce56a
remove unused dict from regridding config
kyleredilla Oct 2, 2024
aa91734
print job_id for prefect ssh to parse
kyleredilla Dec 10, 2024
7172b07
try command in place of conda_init_script
kyleredilla Dec 10, 2024
89c2de7
print job IDs for regrid runner
kyleredilla Dec 10, 2024
6bbf6a0
print list of job ids as space-separated string
kyleredilla Dec 11, 2024
369dfe6
drop crop from target dataset
kyleredilla Dec 11, 2024
7cdfe9d
ensure lon dim is 1D when sorting
kyleredilla Dec 11, 2024
e635d47
disable tryexcept for regrid call
kyleredilla Dec 11, 2024
22f4675
check for latlon dims before fixing
kyleredilla Dec 11, 2024
6725514
add interp method as top level parameter
kyleredilla Dec 12, 2024
c06a64b
fix interp_method top level parameter
kyleredilla Dec 12, 2024
81ef5d1
add missing kwarg
kyleredilla Dec 12, 2024
31593b2
fix positional arg
kyleredilla Dec 12, 2024
407ab7e
fix script arg
kyleredilla Dec 12, 2024
16a28b0
fix script arg
kyleredilla Dec 12, 2024
78adf16
print regrid qc slurm job id
kyleredilla Dec 12, 2024
3ca121f
fix regrid qc sbatch script
kyleredilla Dec 12, 2024
33b8b13
drop ref to error file
kyleredilla Dec 12, 2024
7564311
regridding qc overhaul for generic target grid
kyleredilla Dec 12, 2024
b94848b
clean up regrid qc nb
kyleredilla Dec 13, 2024
be821c2
drop bnds variables first in rasdafy
kyleredilla Dec 13, 2024
fc14494
fix longitude shift for 0-360 src files
kyleredilla Dec 15, 2024
7d30584
Add fixed frequency variables to transfers pipeline (#70)
Joshdpaul Dec 16, 2024
e87b367
add soil temp variable
kyleredilla Dec 20, 2024
89b9b0c
add check for lat var before trnaposing
kyleredilla Dec 21, 2024
e879b4a
add soil temperature to transfers pipeline
kyleredilla Dec 18, 2024
442a2e7
str tweak for fixed freq vars in regrid script
kyleredilla Dec 23, 2024
ecbddca
add rsus, rlus, and mlotst to transfers config
Joshdpaul Jan 6, 2025
e8eb51d
add landsea mask functionality to regrid script
kyleredilla Jan 7, 2025
0292d0e
drop unneeded regrid script arg
kyleredilla Jan 7, 2025
447a055
add sftlf arg to regrid slurm script
kyleredilla Jan 7, 2025
0ee21b5
add no files error to regrid batch gen script
kyleredilla Jan 7, 2025
dbcbe7c
add x and y as dims in regrid batch file generation
kyleredilla Jan 7, 2025
05974b3
add trycatch for sftlf lookup in regrid job generation
kyleredilla Jan 7, 2025
212c587
increase tolerance for native nanmask
kyleredilla Jan 8, 2025
4fbe4c0
tweak qc script for land/sea variables
kyleredilla Jan 8, 2025
8f82f8c
transpose plots and add error handling for file opening for regrid qc
kyleredilla Jan 8, 2025
e3bf4b5
handle use of full lat/lon names in coordinate vars by some models
kyleredilla Jan 8, 2025
899a6cc
fix duplicate nanmin in regrid batch file gen
kyleredilla Jan 8, 2025
700e073
add conversion of snw variable
kyleredilla Jan 13, 2025
0d0d8e4
expand nan fraction thresholds for using native file nanmask
kyleredilla Jan 13, 2025
ad3cab0
add ignore_degenerate arg to regridder
kyleredilla Jan 13, 2025
1f03ca0
makde regridding qc subsetting more robust for nonstandard grids
kyleredilla Jan 18, 2025
048a32c
update regridding config
kyleredilla Jan 23, 2025
9383a9f
remove Omon prsn from transfers
kyleredilla Jan 23, 2025
158b2a2
add script to remove old versions of duplicate raw cmip6 files
kyleredilla Jan 24, 2025
dd2e99a
updates to improve regridding qc
kyleredilla Jan 24, 2025
3de63dd
add function to drop empty directories in cleanout script
kyleredilla Jan 24, 2025
be339d5
remove prsn as landsea variable :facepalm:
kyleredilla Jan 25, 2025
61663d8
use nan funcs for regrid qc plot comparisons
kyleredilla Jan 25, 2025
64afc34
clean up regrid qc code and fix up docs
kyleredilla Jan 25, 2025
56394bd
clean up regridding pipeline including docs
kyleredilla Jan 28, 2025
4694994
typo fix
kyleredilla Jan 29, 2025
049eac8
improve time dim encoding in regrid files
kyleredilla Jan 29, 2025
a20dc6a
fix units and dims in plotting in regrid qc
kyleredilla Jan 31, 2025
928fa29
update readme
kyleredilla Jan 31, 2025
4c64df9
bump regridding QC time limit to 2 hours
kyleredilla Jan 31, 2025
1c9e33c
tweaks to fix regridding for 360 day calendars
kyleredilla Jan 31, 2025
9ed7e84
add missing code for rasdafy arg
kyleredilla Feb 10, 2025
a2e477e
de-duplicate function name
kyleredilla Feb 10, 2025
41ae4ac
fix regrid bounding box fetch for qc and add buffer for file min max
kyleredilla Feb 11, 2025
4c3ec4b
fix regrid to source bbox conversion in qc
kyleredilla Feb 12, 2025
cc4d9d9
improve regrid qc plotting
kyleredilla Feb 13, 2025
4129940
add missing arg in regrid qc function
kyleredilla Feb 13, 2025
1e45c1c
remove holdings output text files
kyleredilla Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion regridding/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Regrid CMIP6 on ACDN

This directory is used for regridding the CMIP6 data mirrored on the ACDN to a common grid. The target grid is the NCAR CESM2 grid, which is also shared by a number of the other chosen models.
This directory is used for regridding the CMIP6 data mirrored on the Arctic Climate Data Node. This pipeline will regrid the specified set of models, scenarios, variables, and frequencies (e.g. temporal resolutions) to the grid of some specified file (referred to as the target grid file). This includes cropping the extent of regridded outputs to match the extent of the target file if it is of a larger extent.

Note - this pipeline was previously used for a single fixed grid - that used by the NCAR CESM2 model, which is also shared by a few of the other chosen models (TaiESM1, NorESM2-MM).

This pipeline also crops these datasets to a pan-arctic domain of 50N - 90N.

Expand Down
54 changes: 30 additions & 24 deletions regridding/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Lookup tables for CMIP6 regridding.
Since we are providing our config via Prefect, these were copied from the transfers/config.py file
to avoid using system environment variables."""
to avoid using system environment variables.
"""

# batch file naming template
batch_tmp_fn = "batch_{model}_{scenario}_{frequency}_{var_id}_{grid_name}_{count}.txt"
Expand All @@ -14,23 +15,6 @@
"ssp585",
]

# institution model strings (<institution>_<model>, from mirrored data) that we will be regridding
inst_models = [
"NOAA-GFDL_GFDL-ESM4",
"NIMS-KMA_KACE-1-0-G",
"CNRM-CERFACS_CNRM-CM6-1-HR",
"NCC_NorESM2-MM",
"AS-RCEC_TaiESM1",
"MOHC_HadGEM3-GC31-MM",
"MOHC_HadGEM3-GC31-LL",
"MIROC_MIROC6",
"EC-Earth-Consortium_EC-Earth3-Veg",
"NCAR_CESM2",
"MPI-M_MPI-ESM1-2-HR",
"DKRZ_MPI-ESM1-2-HR",
"MRI_MRI-ESM2-0",
]

model_inst_lu = {
"ACCESS-CM2": "CSIRO-ARCCSS",
"CESM2": "NCAR",
Expand All @@ -46,11 +30,8 @@
"TaiESM1": "AS-RCEC",
"CESM2-WACCM": "NCAR",
# Another oddity - MPI-ESM1-2-* models have different representation among the institutions, or "Institution ID".
# the -HR version is apparently mostly available under "DKRZ". The -LR version is mostly available under "MPI-M".
# There is apparently mixing, too, as the -HR version has historical data under "MPI-M", and the -LR version has
# data available under "DKRZ". We will just go with the institution which has the majority for each, for now.
# the -HR version was run by "DKRZ" for ScenarioMIP data and MPI-M for CMIP experiment
"MPI-ESM1-2-HR": "DKRZ",
"MPI-ESM1-2-LR": "MPI-M",
}

variables = {
Expand Down Expand Up @@ -93,8 +74,8 @@
},
"prsn": {
"name": "snowfall_flux",
"table_ids": ["Amon", "Omon", "day"],
}, # some models use Omon for table ID
"table_ids": ["Amon", "day"],
},
"snd": {"name": "surface_snow_thickness", "table_ids": ["LImon", "Eday"]},
"snw": {"name": "surface_snow_amount", "table_ids": ["LImon", "day"]},
"rlds": {
Expand Down Expand Up @@ -134,3 +115,28 @@
"table_ids": ["Amon", "day", "Eday"],
},
}

landsea_variables = {
"mrro": "land",
"mrsos": "land",
"mrsol": "land",
"snd": "land",
"snw": "land",
"siconc": "sea",
}

# lookup for the sftlf file paths for each model, hardcoded paths for now
model_sftlf_lu = {
"GFDL-ESM4": "/beegfs/CMIP6/arctic-cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp370/r1i1p1f1/fx/sftlf/gr1/v20180701/sftlf_fx_GFDL-ESM4_ssp370_r1i1p1f1_gr1.nc",
"CNRM-CM6-1-HR": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/historical/r1i1p1f2/fx/sftlf/gr/v20191021/sftlf_fx_CNRM-CM6-1-HR_historical_r1i1p1f2_gr.nc",
"NorESM2-MM": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/NCC/NorESM2-MM/historical/r1i1p1f1/fx/sftlf/gn/v20191108/sftlf_fx_NorESM2-MM_historical_r1i1p1f1_gn.nc",
"TaiESM1": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/fx/sftlf/gn/v20200624/sftlf_fx_TaiESM1_historical_r1i1p1f1_gn.nc",
"HadGEM3-GC31-MM": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-MM/piControl/r1i1p1f1/fx/sftlf/gn/v20200108/sftlf_fx_HadGEM3-GC31-MM_piControl_r1i1p1f1_gn.nc",
"HadGEM3-GC31-LL": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/piControl/r1i1p1f1/fx/sftlf/gn/v20190709/sftlf_fx_HadGEM3-GC31-LL_piControl_r1i1p1f1_gn.nc",
"MIROC6": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/fx/sftlf/gn/v20190311/sftlf_fx_MIROC6_historical_r1i1p1f1_gn.nc",
"EC-Earth3-Veg": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/historical/r1i1p1f1/fx/sftlf/gr/v20211207/sftlf_fx_EC-Earth3-Veg_historical_r1i1p1f1_gr.nc",
"CESM2": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r11i1p1f1/fx/sftlf/gn/v20190514/sftlf_fx_CESM2_historical_r11i1p1f1_gn.nc",
"MPI-ESM1-2-HR": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/historical/r1i1p1f1/fx/sftlf/gn/v20190710/sftlf_fx_MPI-ESM1-2-HR_historical_r1i1p1f1_gn.nc",
"MRI-ESM2-0": "/beegfs/CMIP6/arctic-cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historical/r1i1p1f1/fx/sftlf/gn/v20190603/sftlf_fx_MRI-ESM2-0_historical_r1i1p1f1_gn.nc",
# no sftlf files for E3SM models or KACE-1-0-G
}
Loading