Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_virtual_dataset returns some coordinates as data variables #189

Open
ayushnag opened this issue Jul 15, 2024 · 1 comment · Fixed by #191 · May be fixed by #224
Open

open_virtual_dataset returns some coordinates as data variables #189

ayushnag opened this issue Jul 15, 2024 · 1 comment · Fixed by #191 · May be fixed by #224
Labels
bug Something isn't working CF conventions

Comments

@ayushnag
Copy link
Contributor

ayushnag commented Jul 15, 2024

The xr.Dataset constructed by open_virtual_dataset doesn't seem to correctly identify coordinates when the coordinate has more than one dimension. The bug seems to be in separate_coords on this line. The correct functionality could be to use the coordinates attribute within each variables .zattrs and maintain a set of all coordinate names

Here is a reproducible example:

>>> import xarray as xr
>>> xr.tutorial.open_dataset("ROMS_example.nc") 
<xarray.Dataset> Size: 19MB
Dimensions:     (ocean_time: 2, s_rho: 30, eta_rho: 191, xi_rho: 371)
Coordinates:
    Cs_r        (s_rho) float64 240B ...
    lon_rho     (eta_rho, xi_rho) float64 567kB ...
    hc          float64 8B ...
    h           (eta_rho, xi_rho) float64 567kB ...
    lat_rho     (eta_rho, xi_rho) float64 567kB ...
    Vtransform  int32 4B ...
  * ocean_time  (ocean_time) datetime64[ns] 16B 2001-08-01 2001-08-08
  * s_rho       (s_rho) float64 240B -0.9833 -0.95 -0.9167 ... -0.05 -0.01667
Dimensions without coordinates: eta_rho, xi_rho
Data variables:
    salt        (ocean_time, s_rho, eta_rho, xi_rho) float32 17MB ...
    zeta        (ocean_time, eta_rho, xi_rho) float32 567kB ...
Attributes: (12/34)
    file:              ../output_20yr_obc/2001/ocean_his_0015.nc
    format:            netCDF-4/HDF5 file
    Conventions:       CF-1.4
    type:              ROMS/TOMS history file
    title:             TXLA ROMS hindcast run with dyes and oxygen
    rst_file:          ../output_20yr_obc/2001/ocean_rst.nc
    ...                ...
    compiler_flags:    -heap-arrays -fp-model fast -mt_mpi -ip -O3 -msse2 -free
    tiling:            010x012
    history:           Tue Jul 24 11:04:43 2018: /opt/nco/ncks -D 4 -t 8 /cop...
    ana_file:          /home/d.kobashi/TXLA_ROMS_reana/Functionals/ana_btflux...
    CPP_options:       TXLA2, ANA_BPFLUX, ANA_BSFLUX, ANA_BTFLUX, ANA_NUDGCOE...
    NCO:               netCDF Operators version 4.7.6-alpha04 (Homepage = htt...
% wget https://github.com/pydata/xarray-data/raw/master/ROMS_example.nc
>>> from virtualizarr import open_virtual_dataset
>>> vds = open_virtual_dataset('ROMS_example.nc', indexes={})
>>> vds
<xarray.Dataset> Size: 19MB
Dimensions:     (ocean_time: 2, eta_rho: 191, xi_rho: 371, s_rho: 30)
Coordinates:
    s_rho       (s_rho) float64 240B ManifestArray<shape=(30,), dtype=float64...
    ocean_time  (ocean_time) float64 16B ManifestArray<shape=(2,), dtype=floa...
Dimensions without coordinates: eta_rho, xi_rho
Data variables:
    zeta        (ocean_time, eta_rho, xi_rho) float32 567kB ManifestArray<sha...
    lon_rho     (eta_rho, xi_rho) float64 567kB ManifestArray<shape=(191, 371...
    Vtransform  int32 4B ManifestArray<shape=(), dtype=int32, chunks=()>
    Cs_r        (s_rho) float64 240B ManifestArray<shape=(30,), dtype=float64...
    hc          float64 8B ManifestArray<shape=(), dtype=float64, chunks=()>
    lat_rho     (eta_rho, xi_rho) float64 567kB ManifestArray<shape=(191, 371...
    h           (eta_rho, xi_rho) float64 567kB ManifestArray<shape=(191, 371...
    salt        (ocean_time, s_rho, eta_rho, xi_rho) float32 17MB ManifestArr...
Attributes: (12/34)
    CPP_options:       TXLA2, ANA_BPFLUX, ANA_BSFLUX, ANA_BTFLUX, ANA_NUDGCOE...
    Conventions:       CF-1.4
    NCO:               netCDF Operators version 4.7.6-alpha04 (Homepage = htt...
    NLM_LBC:           \nEDGE:    WEST   SOUTH  EAST   NORTH  \nzeta:    Che ...
    ana_file:          /home/d.kobashi/TXLA_ROMS_reana/Functionals/ana_btflux...
    avg_base:          ../output_20yr_obc/2001/ocean_avg
    ...                ...
    sta_file:          ocean_sta.nc
    svn_rev:            
    svn_url:           https:://myroms.org/svn/src
    tiling:            010x012
    title:             TXLA ROMS hindcast run with dyes and oxygen
    type:              ROMS/TOMS history file

Note that the underlying kerchunk json does have this coordinate information since when you virtualize the dataset and materialize data, the coordinates are correct:

>>> refs = vds.virtualize.to_kerchunk(filepath=None, format="dict")
>>> xr.open_dataset("reference://", engine="zarr", chunks={}, backend_kwargs={"storage_options": {"fo": refs, "consolidated": False}})
<xarray.Dataset> Size: 19MB
Dimensions:     (s_rho: 30, eta_rho: 191, xi_rho: 371, ocean_time: 2)
Coordinates:
    Cs_r        (s_rho) float64 240B dask.array<chunksize=(30,), meta=np.ndarray>
    Vtransform  float64 8B ...
    h           (eta_rho, xi_rho) float64 567kB dask.array<chunksize=(191, 371), meta=np.ndarray>
    hc          float64 8B ...
    lat_rho     (eta_rho, xi_rho) float64 567kB dask.array<chunksize=(191, 371), meta=np.ndarray>
    lon_rho     (eta_rho, xi_rho) float64 567kB dask.array<chunksize=(191, 371), meta=np.ndarray>
  * ocean_time  (ocean_time) datetime64[ns] 16B 2001-08-01 2001-08-08
  * s_rho       (s_rho) float64 240B -0.9833 -0.95 -0.9167 ... -0.05 -0.01667
Dimensions without coordinates: eta_rho, xi_rho
Data variables:
    salt        (ocean_time, s_rho, eta_rho, xi_rho) float32 17MB dask.array<chunksize=(1, 15, 96, 186), meta=np.ndarray>
    zeta        (ocean_time, eta_rho, xi_rho) float32 567kB dask.array<chunksize=(1, 191, 371), meta=np.ndarray>
Attributes: (12/34)
    CPP_options:       TXLA2, ANA_BPFLUX, ANA_BSFLUX, ANA_BTFLUX, ANA_NUDGCOE...
    Conventions:       CF-1.4
    NCO:               netCDF Operators version 4.7.6-alpha04 (Homepage = htt...
    NLM_LBC:           \nEDGE:    WEST   SOUTH  EAST   NORTH  \nzeta:    Che ...
    ana_file:          /home/d.kobashi/TXLA_ROMS_reana/Functionals/ana_btflux...
    avg_base:          ../output_20yr_obc/2001/ocean_avg
    ...                ...
    sta_file:          ocean_sta.nc
    svn_rev:            
    svn_url:           https:://myroms.org/svn/src
    tiling:            010x012
    title:             TXLA ROMS hindcast run with dyes and oxygen
    type:              ROMS/TOMS history file
@TomNicholas TomNicholas added the bug Something isn't working label Aug 17, 2024
@TomNicholas TomNicholas changed the title Missing xr.Coordinates from virtualized datasets open_virtual_dataset returns some coordinates as data variables Sep 16, 2024
@TomNicholas
Copy link
Member

I wouldn't say this issue is fully closed yet. See #281 (comment) for an explanation. #191 closes an important part of it but #224 is also required.

@TomNicholas TomNicholas reopened this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CF conventions
Projects
None yet
2 participants