-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling filling of missing chunks #489
base: support/v2
Are you sure you want to change the base?
Allow disabling filling of missing chunks #489
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @willirath!
As is, Array._fill_missing_chunk
only exists if Array.set_options()
has been called (giving rise to the CI test failures). What do you think about adding
# initialize options
self.set_options()
to the bottom of Array.__init__
to initialize the default option values?
Also, it'd be great if there was a test to ensure the fill_missing_chunk=
parameter results in the expected behavior
I've added the initialization of the Regarding tests: Where should I add it? I'd go for:
Regarding the structure: I'm not at all familiar with the internal design of zarr-python. Is this decentralized conditional raising really the way to go, or should this be abstracted away? |
Going to re-open to try to get travis green. Coveralls will stay red until a test is added. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## support/2.x #489 +/- ##
============================================
Coverage 99.94% 99.94%
============================================
Files 32 33 +1
Lines 11256 11285 +29
============================================
+ Hits 11250 11279 +29
Misses 6 6
|
Hi @willirath, I've updated this branch and all existing tests are passing. Are you still interested in taking it forward? |
Thanks for pinging me. I'm still interested. It'll take a few days, though. |
@willirath just checking if you've had any time to work on this lately. This functionality would be super helpful and thanks for filing the PR! I'm happy to try to work on these tests if you'd like someone else to push it forward. |
Hello @willirath! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-03-07 22:22:58 UTC |
@bolliger32 I've found some time to continue with this today. (:+1: for pinging me again. Adding some urgency usually helps finishing this kind of work.) @joshmoore and @jrbourbeau I've added two tests relying on the From my PoV, this is ready for review. |
@willirath amazing! Many thanks! 🙏 |
Pinging for review again. |
Thanks for the ping, @willirath, definitely time. I personally don't foresee too much review capacity during the holidays, but will leave the tab open. Obviously, if anyone else could jump in, that'd be wonderful! 🎆 Just in case you missed them, #853 and the proceeding #738 from @d-v-b might be of interest. |
Ah, nice. With a clearer 2022 head, I notice just how complementary this is to @d-v-b's #753 and @jni's #853. In Zarr 2.11, the default will become to not serialize empty (i.e. fill_value_only) chunks. With this PR, the user can prevent empty chunks from being deserialized. 👍 I'm going to update the branch in order to trigger another round of tests. (I slightly wonder if there's not a need to unify settings/options/arguments but that's likely out of scope) |
Still green after an update to Probably the biggest question from my side is what else should fall into the |
The more I come back to this, @willirath, the more I feel that either values like (This is orthogonal to whether these values should actually be .zarray metadata, and in that case, they might should be |
I have a need for this feature (see also #486 (comment)). It seems that adding |
Agreed with setting this parameter in |
Bumping this if still relevant. We are having issues with the "missing rectangles" with S3, similar to pangeo-data/pangeo#691. This seems to persist with using It would be nice to fix upstream in Zarr if I'm understanding the issue correctly. EDIT: We were able to fix this issue on our AWS Sagemaker instances awhile back with using this in the header: import boto3
import s3fs
session = boto3.Session()
credentials = session.get_credentials()
fs = s3fs.S3FileSystem(
key=credentials.access_key,
secret=credentials.secret_key,
token=credentials.token,
) I just fixed the most recent manifestation of this issue that was showing up on our AWS Batch jobs through session = boto3.Session()
credentials = session.get_credentials()
fs = s3fs.S3FileSystem(
key=credentials.access_key,
secret=credentials.secret_key,
token=credentials.token,
# This seems to help with writing
# https://github.com/pydata/xarray/issues/3831#issuecomment-1768393788
skip_instance_cache=True,
) The above code replaces the following setup we were using. I would assume
|
I've moved this PR to the |
This is a first stab at solving #486 by overriding filling of missing chunks.
TODO:
Not sure about the following todo's:
tox -e docs
)