Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Suggested Workflow #40

Open
ChrisBarker-NOAA opened this issue Jun 26, 2024 · 0 comments
Open

Document Suggested Workflow #40

ChrisBarker-NOAA opened this issue Jun 26, 2024 · 0 comments

Comments

@ChrisBarker-NOAA
Copy link
Collaborator

This code is focused on a specific part of the workflow folks may need to do -- but we are also provided tools and utilities for other bits. So I think it's helpful to Document the suggested workflow, and that will also help us determine where to put code.

My first draft:

Goal:

Starting Point:

User has a set of data that can be loaded into xarray: could be files on disk, or files on AMS, or Kerchunked zarr dataset, or ....

User needs a subset of that data:

  • Restricted to:
    • a polygon in space
    • particular time frame
    • either a single vertical layer or all vertical layers (proper vertical subsetting can wait ...)
    • only the variables they need.

Outcome:

An xarray Dataset all ready to save to netcdf, or .....

That Dataset contains only what the user wants -- and is as similar as the original as possible. e.g. same names for all variables, maybe some additional metadata.

Workflow:

Step One:

User does any pre-processing required to get their data into a single, conforming dataset.

In many cases, there's nothing to be done, but it some cases, there may be work to be done:

  1. The grid and dat variables are in multiple files, they need to be combined into one dataset
  2. If there are "troublesome" variables -- e.g. time coordinates that aren't correct, etc.

As a rule, this will be model specific, maybe even implementation-of-model specific.

This package can't provide all of that, but it can (and should) provide a few examples for common cases.

e.g. SCHISM (STOFS), maybe FVCOM fixing teh time variable (some use single precision float days :-()

Step 2:

The user processes the Dataset to make it CF compliant (or enough so that the subsetting code can work)

This package will contain utilities to do that, e.g.

ugrid.assign_ugrid_topology()

Step 3:

The Dataset can be queried by the user to find out what they need to know in order to specify a subset:

  • what variables are in the dataset
  • what timespan is covered
  • what region is covered (maybe?)
  • whether it's 2D or 3D ?

Step 4:

The user makes a request for a subset.

Result -- a subset Dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant