Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle "fx" variables in regridding, or remove them from transfer manifest #4

Open
kyleredilla opened this issue Jan 11, 2024 · 4 comments

Comments

@kyleredilla
Copy link
Contributor

The temporally fixed variables such as surface altitude (orog) cause some trouble with the regridding and that code will need to be adapted to handle them (this might be fairly simple, perhaps try-except for missing time dimension only?). Currently they must be manually removed from our filesystem to permit regridding for all other files to proceed (e.g. because of batch file structure).

They are still part of the transfer manifest, and so must be re-downloaded eventually. We want to have these files on the common grid, so we do need to adapt the regridding code to handle them. Perhaps this should go hand-in-hand with implementation of Prefect for the regridding pipeline, and a possible refector to allow config-specified processing, instead of the current "shotgun" / "regrid everything" approach.

@Joshdpaul
Copy link
Contributor

In my testing, time dimension errors when running regridding/generate_batch_files.py were resolved after deleting all files with fx and Ofx variables (See #2 ).

Consider incorporating a test of of the time dimension into the transfers/tests/test_mirror.py script. That script already touches each file with xarray so perhaps we can test for a time dimension. Files without a time dimension could be stored (as CSV?) which would allow them to be separated in the regridding pipeline and processed differently than the majority of the files.

@Joshdpaul
Copy link
Contributor

Looks like some of these files with fx frequency lacking a time dimension are limited to Antarctic extent, and also maybe use some kind of Antarctic grid. The regrid/latitude cropping process does actually work on these, but gives strange results. Our cropping process relies on the latitude being geographic coordinates, but it looks like these Antarctica files do not use a standard geographic coordinate system.

The time_dim_test branch has some revisions that include a standalone test script to find files without time dimensions, and a section of the regrid.py script has been revised with a try/except routine and reference to the list of time-less files to isolate these files so they don't break the regridding for the rest of the dataset.

image
image

@Joshdpaul
Copy link
Contributor

Discussed with @kyleredilla and @BobTorgerson 2/26/24. Kyle has excluded these problematic "fixed" variables from the transfer batch files via bf7e8c9 . This issue can be shelved for now, and we will determine later on whether we really need the fixed variables.

@Joshdpaul
Copy link
Contributor

While investigating some other issues related to grid types, I came upon this CMIP6 common vocabulary JSON that I thought could be used to exclude these Antarctic datasets.

The JSON suggests that "*a" or "*g" grid id suffixes might be used to exclude these files. But, looking back at the code in this issue, it looks like this particular file uses a standard native grid id "gn" even though the extent is Antarctic only.

So if/when we revisit this issue, don't assume we can solve it by excluding grid types! We need another way to identify whether the XY coords of the grid are lon/lat or are some Antarctic/Greenland/whatever grid type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants