-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix longitude in regridded files, add CRS info #35
Conversation
…r on subdaily freqs
…d allows post-error processing of remaining files in batch
…ly data from batch files
…d.py slurm job after jobs complete in the prefect flow
…with in generate_batch_files.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have successfully ran and tested this branch of the regridding pipeline. I didn't run all model / scenario /variable combinations but rather a random sample, which is sufficient for now as we continue developing and testing this pipeline in other branches.
I made some changes to various scripts since you edited @Joshdpaul so feel free to re-test if you would like. But I approve regardless because we will be reviewing this again with your prefect-ifying work. The one thing I think needs to be fixed is dropping the test.csv file - just want to make sure I'm not missing something there. Feel free to merge when you want after that is addressed.
The notable changes I made are to regrid.py
to include the adjusted lon_bnds variable in the target dataset (not sure if it matters but it's for consistency) and use the period=True
option in the regridder; adjusting the tests/test_regridding.py
script to test for the new longitude coords; and of course all of the stuff in the explore_regridding.ipynb
notebook as we worked through that set of issues.
Revised 3/7/24: The standalone longitude correction / CRS script has been rolled into the main
regrid.py
function instead!This PR closes #25 and closes #30
The
regrid.py
script now includes:a reindexing of longitude coordinates as part of the
init_regridder()
function.a new
apply_wgs84()
function that checks for an existing "spatial_ref" coordinate in the dataset, and if not found will attempt to write CF-compliant CRS info to the file.a new
write_retry_batch_file()
function which will write any filepaths that were not successfully regridded to a separate text file to be retried in a new slurm job. Combined with the new try/except routine in the main block, this allows batches to try every file in the list regardless of whether or not errors are found in some filepaths.an additional query in
generate_batch_files.py
that will exclude subdaily frequencies (ie, data transferred specifically for WRF downscaling but not wanted for regridding)TO TEST:
regrid_cmip6.ipynb
notebook to submit the slurm jobs.batch_retry.txt
file containing the bogus filenames that were not regridded.xarray
and check the CRS info usingrioxarray
accessor. (You will need an environment that hasrioxarray
):Things to note:
The longitude attributes in the regridded .nc files may still reference values 0-360, since this branch does not include the attribute fixes yet. You may also see some warnings if opening the regridded files with
xarray.open_dataset(decode_coords='all')
that stem from non-standard attributes.Future work:
Now that we have slurm outputs that are searchable (ie, have standardized error messaging), we can include them in a QC process similar to the indicators Prefect flow. After the jobs complete in the Prefect flow, we can look for the "retry" file and try any bad files a second time. Thats probably also the point in the flow where we can address this issue about stuck jobs, maybe setting a time limit for a batch to complete and adding the files to the "retry" batch if it gets stuck.
Exploratory notebooks:
These were updated in this branch and @kyleredilla and I were messing with NaN values, grids, extrapolation, etc. That work is going to be committed here but is really part of a different grid selection problem that will be solved in other branches.