-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A new API and a new internal structure #231
Comments
David and I had a lengthy conversation and came up with the following potential new API. It depends on some new components which we will pretend exist in the following snippets (to be clear, none of the imports from PyActive exist, these are proposals). First we start with the vanilla file opening. from h5netcdf.legacyapi import Dataset
filename ='fred.nc'
f = Dataset('fred.nc')
v = f['temp'] v would then be a normal h5netcdf We want to provide access to the storage chunks of We first show our thinking for how we might handle the active API". It starts with getting a modified version of from PyActive import active
av = active(v)
Example 1: Normal access to slices of result1 = v[56:60]
result2 = av[56:60]
assert result1 == result2
Example 2: Doing an active operation on the entire array import numpy as np
result1 = np.mean(v)
result2 = av.mean[:]
assert result1 == result2 In this case result2 is calculated with chunk means calculated in the storage server, so result2 has far less network traffic involved than result1, where the chunk means are calculated in this python code itself. (All the methods supported by the active storage need to be supported as methods on the ActiveDataset instance, which has implications for how we add new reductions into the client as well as the server, but we expect this to be a rare thing.) Example 3: Doing an active operation on a subset of the array result1 = np.mean(v[56:60])
result2 = av.mean[56:60]
assert result1 == result2 If the slice [56:60] intersected in two storage chunks, then in this case the active mean would be calculated on both chunks storage side, and the two results returned and meaned. Note that the active version would work using missing data masks by default. Example 4: Using Dask to do normal operations import dask.array as da
y = da.from_array(av)
Example 5: Using dask with active It was tempting to think that the extension from that would be to do things like this: y = da.from_array(av.mean) but we rapidly realised all sorts of odd things would happen as import PyActive as pa
y = pa.mean.from_array(av) In this case, again, all the reduction methods need to be supported in the active library, but they all have a common pattern which involves some manipulation of the dask instances (David to add details). |
(It is important for everyone to realise that in the event the storage is not active, this client will handle the operations client side anyway, so the code will work with and without active storage) |
Hi Bryan - Great write up, thanks. It still seems to make sense. One API thing we touched in briefly was renaming the class (from I'll think a bit more on the "Using dask with active" case, as you say. |
@davidhassell I am assuming the dask issue will also bite us here as well as in cf-python? |
Hi @bnlawrence - the good news is that I don't think there'll be any problem here. The dask issue is about only what goes in cf-python before it gets to the stage instantiating any |
this is cool! Finally had a chance to go through it! The main questions I'd ask here are:
Let me start with a set of vanilla refactoring ops for now, then we'll decide how best to juggle what we have 🍺 |
I think we would want it to be usable by civilians as well as higher level libraries, so in that sense I do think we need Active ... |
sounds about what I was stinking too |
PyActiveStorage is currently a "research activity" and we need to transition it to a library with a clean API and clear internal functionality - in partnership, for the moment, with the Reductionist Library.
There are three things we need to do:
This issue is mainly about the first of these objectives. We'll spin off issues for the other two when that is done.
The text was updated successfully, but these errors were encountered: