Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency in dask applications. #213

Open
bnlawrence opened this issue Jul 23, 2024 · 0 comments
Open

Efficiency in dask applications. #213

bnlawrence opened this issue Jul 23, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@bnlawrence
Copy link
Collaborator

It is clear we want to avoid pyactivestorage doing a file open inside each dask chunk if such a file open requires a remote index read each and every time.

A quick hack to fix this (in the pyfive branch) would be to avoid keeping the File instance open (the optimal_kerchunk branch already does this). With that one change, users could at least use active storage instances many times without worrying about the file open count.

A better solution long term may involve lifting the internal s3fs outside so we can take advantage of the s3fs caching.

@bnlawrence bnlawrence added the enhancement New feature or request label Jul 23, 2024
@bnlawrence bnlawrence self-assigned this Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant