-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open Kerchunk refs as Virtual Dataset #119
Conversation
Update: @TomNicholas and I dug a bit deeper on the
|
|
||
vds = dataset_from_kerchunk_refs(refs_dict) | ||
return vds | ||
elif kerchunk_storage_ftype == ".parquet": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parquet files are not required to have this suffix for instance ".parq" is also very common. Not sure if there is a better way to tell the type of file though.
|
||
# Question: How should we read the parquet files | ||
# into a dict to pass into dataset_from_kerchunk_refs? | ||
# pandas, pyarrow table, duckdb? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like pandas would be a fine way to get things working and then you can always switch it out.
fpath = fsspec.filesystem(protocol, **storage_options).open(filepath) | ||
fpath = fsspec.filesystem(protocol, **storage_options) | ||
if universal_filepath.is_file(): | ||
fpath = fpath.open(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble figuring out what motivated these changes.
def test_kerchunk_to_virtual_dataset(netcdf4_file, tmpdir, format): | ||
vds = open_virtual_dataset(netcdf4_file, indexes={}) | ||
|
||
# QUESTION: should these live in a fixture? ex. kerchunk_ref_fpath_json, kerchunk_ref_fpath_parquet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh you kind of want the original vds as well as the kerchunk refs so I think it is fine as is.
changelog.md
api.rst
Start of PR to address #118.
Lots of open questions!
.parquet
files intoKerchunkStoreRefs
to pass into dataset_from_kerchunk_refs_ARRAY_DIMENSIONS
Would love some feedback @jsignell!