Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix longitude in regridded files, add CRS info #35

Merged
merged 48 commits into from
Mar 11, 2024
Merged

Fix longitude in regridded files, add CRS info #35

merged 48 commits into from
Mar 11, 2024

Conversation

Joshdpaul
Copy link
Contributor

@Joshdpaul Joshdpaul commented Mar 4, 2024

Revised 3/7/24: The standalone longitude correction / CRS script has been rolled into the main regrid.py function instead!

This PR closes #25 and closes #30

The regrid.py script now includes:

  • a reindexing of longitude coordinates as part of the init_regridder() function.

  • a new apply_wgs84() function that checks for an existing "spatial_ref" coordinate in the dataset, and if not found will attempt to write CF-compliant CRS info to the file.

  • a new write_retry_batch_file() function which will write any filepaths that were not successfully regridded to a separate text file to be retried in a new slurm job. Combined with the new try/except routine in the main block, this allows batches to try every file in the list regardless of whether or not errors are found in some filepaths.

  • an additional query in generate_batch_files.py that will exclude subdaily frequencies (ie, data transferred specifically for WRF downscaling but not wanted for regridding)

TO TEST:

  • Start the regridding pipeline as usual by generating the batch files in your scratch directory.
  • Delete most of the batch files, leaving a small subset for testing the actual regridding.
  • In one of these batch files, include some bogus filenames to generate errors.
  • As usual, use the regrid_cmip6.ipynb notebook to submit the slurm jobs.
  • All slurm jobs, even for the batch file with bogus filenames, should complete without a "FAILED" state.
  • Check out the slurm job outputs, to confirm that the batch with bogus filenames still completed and that error messages were written into the output file. There should be messaging indicating that some files were not regridded.
  • Check the directory with the batch files. You should see a new batch_retry.txt file containing the bogus filenames that were not regridded.
  • Open one of the regridded files in QGIS against a basemap. Check out the properties of the layer and confirm that QGIS recognizes the CRS as WGS84, and that the image is rendered in the correct location. Some basemaps do not actually extend to 90deg latitude, and our current target grid does not actually extend to the meridian due to weird half-sized pixels. Keep that in mind when viewing against a basemap, and instead look to see that the features in the interior of the image seem to generally align with land masses etc.
  • Open one of the regridded files using xarray and check the CRS info using rioxarray accessor. (You will need an environment that has rioxarray):
import xarray as xr
import rasterio

fp = '/Users/joshpaul/Desktop/SNAP/CMIP6/qgis2/pr_day_CESM2_ssp126_regrid_20700101-20701231.nc'
ds = xr.open_dataset(fp, decode_coords="all")

print(ds.rio.crs)

Things to note:
The longitude attributes in the regridded .nc files may still reference values 0-360, since this branch does not include the attribute fixes yet. You may also see some warnings if opening the regridded files with xarray.open_dataset(decode_coords='all') that stem from non-standard attributes.

Future work:
Now that we have slurm outputs that are searchable (ie, have standardized error messaging), we can include them in a QC process similar to the indicators Prefect flow. After the jobs complete in the Prefect flow, we can look for the "retry" file and try any bad files a second time. Thats probably also the point in the flow where we can address this issue about stuck jobs, maybe setting a time limit for a batch to complete and adding the files to the "retry" batch if it gets stuck.

Exploratory notebooks:
These were updated in this branch and @kyleredilla and I were messing with NaN values, grids, extrapolation, etc. That work is going to be committed here but is really part of a different grid selection problem that will be solved in other branches.

@Joshdpaul Joshdpaul requested a review from kyleredilla March 4, 2024 16:58
@Joshdpaul Joshdpaul changed the base branch from indicators to main March 4, 2024 20:24
Copy link
Contributor

@kyleredilla kyleredilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have successfully ran and tested this branch of the regridding pipeline. I didn't run all model / scenario /variable combinations but rather a random sample, which is sufficient for now as we continue developing and testing this pipeline in other branches.

I made some changes to various scripts since you edited @Joshdpaul so feel free to re-test if you would like. But I approve regardless because we will be reviewing this again with your prefect-ifying work. The one thing I think needs to be fixed is dropping the test.csv file - just want to make sure I'm not missing something there. Feel free to merge when you want after that is addressed.

The notable changes I made are to regrid.py to include the adjusted lon_bnds variable in the target dataset (not sure if it matters but it's for consistency) and use the period=True option in the regridder; adjusting the tests/test_regridding.py script to test for the new longitude coords; and of course all of the stuff in the explore_regridding.ipynb notebook as we worked through that set of issues.

@Joshdpaul Joshdpaul merged commit 542bf9b into main Mar 11, 2024
@Joshdpaul Joshdpaul deleted the longitude branch March 11, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace 0-360 with standard longitude coordinates Make regridded files GIS compatible
2 participants