Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out frequencies, models, and scenarios in regridding pipeline #40

Merged
merged 7 commits into from
Aug 15, 2024

Conversation

kyleredilla
Copy link
Contributor

@kyleredilla kyleredilla commented Jul 26, 2024

This PR adds parameters for frequency, model, and scenario to the regridding pipeline flow. This enhancement permits the processing of any single combination of model, scenario, variable, and frequency. The default option is "all" for each of these, meaning all available data.

To test, execute the regrid CMIP6 flow like you would with any other prefect flow. Use /beegfs/CMIP6/arctic-cmip6/CMIP6 as the cmip6_directory, and use fix_regrid for the branch_name.

Try testing a few different combinations of things, such as

  1. a single frequency, model, scenario, and variable;
  2. then maybe 2 or more for each,
  3. then the default "all" for each, to run the whole shebang.

You can test with either a personal directory in Chinook, or the snapdata directory.

To verify the regridding, you can look into the QC files in <scratch_directory>/cmip6_regridding/qc/. You can view the qc_error.txt fiel to make sure there are no errors written there, and check out the rendered HTML QC notebook that will be saved to visual_qc_out.html (I copy to it locally via SCP but there might be a way to view on Chinook?) to check out visuals of the new data.

Please share any and all suggestions! This is a critical part of the CMIP6 pipeline. While I anticipate running this less and less frequently, I'm open to all ideas for polishing it and trying to improve the quality of the data product.

closes #20 #41

@kyleredilla
Copy link
Contributor Author

Converting to draft, see comment on cmip6-utils PR

@kyleredilla kyleredilla marked this pull request as draft July 30, 2024 18:00
@kyleredilla kyleredilla marked this pull request as ready for review August 8, 2024 16:09
@kyleredilla
Copy link
Contributor Author

Okay, I think this is ready to try testing again - I've made some improvements that I hope will help things be more robust and prevent the hanging I was seeing with multiprocessing in some steps (or maybe only the batch file generation).

Also, see ua-snap/snap-geo#6 - I would appreciate if use this PR as an opportunity to try kicking the tires on the new snap-geo conda environment! I have been testing with it and it seems to be working.

Copy link
Contributor

@cstephen cstephen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me! As suggested, I tried running the regridding pipeline through Prefect with:

  • a single var/freq/model/scenario combo
  • two vars/freqs/models/scenarios in a few different combos
  • "all" for each

I inspected the qc_error.txt file (always empty) and visual QC notebook after each run. The random sampling of side-by-side source & regridded images is an excellent idea and they always looked similar & sane.

I ran into the same issue described in #8 a couple times, but no biggie. That's an issue to be tackled separately.

I also tried installing the snap-geo conda environment from the new env_from_history.yml file on my Mac and it worked! I'll add more details in ua-snap/snap-geo#6

Awesome work!!

@charparr
Copy link

@kyleredilla great work! I've ran several flows now with both the cmip6-utils and the snap-geo env and examined the QC results and they look great. I left a few minor comments but feel free to take 'em / leave 'em prior to merging. This PR seems good to go, nice job!

regridding/luts.py Show resolved Hide resolved
regridding/regridding_functions.py Show resolved Hide resolved
regridding/regridding_functions.py Show resolved Hide resolved
@kyleredilla kyleredilla merged commit 7f54131 into main Aug 15, 2024
@kyleredilla kyleredilla deleted the split_freqs branch August 15, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix any "all models" features for CMIP6 flows
3 participants