Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] ENH: Add context manager for zarr store cleanup #113

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

bzah
Copy link

@bzah bzah commented Mar 29, 2022

I made a small context manager in my project to wrap rechunker.rechunk so I thought it could be nice to share it here.
I did that because I was tired of deleting the stores after each unsuccessful test.

The context manager rechunk_cm adds:

  • simple fsspec filesystem handling
  • a check if a target_store exist and raising of a FileExistsError if true (move that to rechunk() ?)
  • cleanup of temp_store using fsspec filesystem
  • the possibility of removing target_store as well (controlled by keep_target_store flag)

The last point might be useful when within the context, the target_store is opened, a computation is done and only the
resulting computation must be saved. In that case, rechunker is only used to pre-optimize the computation through a better but temporary chunking on disk.


Notes:

  1. I'm very new to fsspec so I'm not 100% if I'm using it correctly.
  2. I also tried creating a unit test using the MemoryFileSystem of fsspec but without success.
  3. The name rechunk_cm could probably be improved.

I don't mind discarding this PR if you don't think this feature should be in rechunker.
Thanks for this awesome project anyway.

The context manager adds:
- simple fsspec filesystem handling
- checking of existing `target_store` and raises a FileExistsError if needed
- cleanup of `temp_store` using fsspec filesystem
- the possibility of removing target_store as well
Copy link
Member

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bzah - thanks so much for this contribution and so sorry that no one has reviewed your changes so far.

Your PR looks great to me. I would gently ping @martindurant on the fsspec-related issues, but I don't see any problems. This seems very useful.

My only question is on the API. Maybe we just want to make this the new default?

rechunker/api.py Outdated


@contextlib.contextmanager
def rechunk_cm(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice if we could just call rechunk directly as a context manager and get the right behavior, rather than introducing a new function to the API.

Copy link
Author

@bzah bzah Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I renamed rechunk to _unsafe_rechunk and rechunk_cm to rechunk. Only the later is exposed in init

rechunker/api.py Outdated
) -> Rechunked:
try:
if isinstance(filesystem, str):
filesystem = fsspec.filesystem(filesystem)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take the same options as whatever filesystem we are going to operate on (target_options or temp_options?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Martin if you can turn this into a suggested change, I will merge it in. We let @bzah's PR sit for a long time, so I'm not expecting him to be super responsive right now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here why there are temp_options and target_options, but only one filesystem - that's stopping me from knowing which to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In general we support using different filesystems for temp vs. target. This new function assumes that they are both paths within the same filesystem.

This comment was marked as off-topic.

Copy link
Author

@bzah bzah Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now 2 filesystems, one for each store.

@bzah
Copy link
Author

bzah commented Jul 20, 2022

I tried to add a unit test using a MemoryFileSystem but zarr fails to create the group saying zarr.errors.ReadOnlyError: object is read-only.
Do you have an idea how to fix that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants