-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(fix): extension array indexers #9671
base: main
Are you sure you want to change the base?
Conversation
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…ocessing, raise now early
…t resolution, fix code and tests to allow this
for more information, see https://pre-commit.ci
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@dcherian @benbovy @Illviljan anything left here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few comments and suggestions. I didn't tested it, though.
xarray/core/indexing.py
Outdated
) -> np.ndarray: | ||
if dtype is None: | ||
dtype = self.dtype | ||
if pd.api.types.is_extension_array_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be cleaner?
if dtype is None and is_valid_numpy_dtype(self.dtype):
dtype = self.dtype
This will just let numpy set the appropriate dtype when coercing the pandas.Index.
Just a quick check that default output dtypes make sense for pd.CategoricalIndex and pd.PeriodIndex:
>>> import numpy as np
>>> import pandas as pd
>>> np.__version__
'2.0.2
>>> pd.__version__
'2.2.3'
>>> cidx = pd.CategoricalIndex(["a"])
>>> np.asarray(cidx.values, dtype=None).dtype
dtype('O')
>>> cidx2 = pd.CategoricalIndex([1])
>>> np.asarray(cidx2.values, dtype=None).dtype
dtype('int64')
>>> pidx = pd.PeriodIndex([2022], freq="Y")
>>> np.asarray(pidx.values, dtype=None).dtype
dtype('O')
Co-authored-by: Benoit Bovy <[email protected]>
Co-authored-by: Benoit Bovy <[email protected]>
@@ -1118,7 +1118,8 @@ def test_groupby_math_nD_group() -> None: | |||
expected = da.isel(x=slice(30)) - expanded_mean | |||
expected["labels"] = expected.labels.broadcast_like(expected.labels2d) | |||
expected["num"] = expected.num.broadcast_like(expected.num2d) | |||
expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data[idxr]) | |||
# mean.num2d_bins.data is a pandas IntervalArray so needs to be put in `numpy` to allow indexing | |||
expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data.to_numpy()[idxr]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically backwards-incompatible, but an improvement IMO. Just noting in case someone looks this up in the future.
Before:
num2d_bins
mean.num2d_bins
<xarray.DataArray 'num2d_bins' (num2d_bins: 2)> Size: 16B
array([Interval(0, 4, closed='right'), Interval(4, 6, closed='right')],
dtype=object)
Coordinates:
* num2d_bins (num2d_bins) object 16B (0, 4] (4, 6]
After:
ipdb> mean.num2d_bins
mean.num2d_bins
<xarray.DataArray 'num2d_bins' (num2d_bins: 2)> Size: 16B
array([Interval(0, 4, closed='right'), Interval(4, 6, closed='right')],
dtype=object)
Coordinates:
* num2d_bins (num2d_bins) interval[int64, right] 16B (0, 4] (4, 6]
@@ -834,6 +834,7 @@ def chunk( | |||
if chunkmanager.is_chunked_array(data_old): | |||
data_chunked = chunkmanager.rechunk(data_old, chunks) # type: ignore[arg-type] | |||
else: | |||
ndata: duckarray[Any, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the pandas-specific code, I'm not sure we should do that, we might as well just ask the user to cast.
* main: Vendor pandas to xarray conversion tests (pydata#10187) Fix: Correct axis labelling with units for FacetGrid plots (pydata#10185) Use explicit repo name in upstream wheels (pydata#10181) DOC: Update docstring to reflect renamed section (pydata#10180)
@@ -104,17 +104,11 @@ def index_flat(request): | |||
index fixture, but excluding MultiIndex cases. | |||
""" | |||
key = request.param | |||
if key in ["bool-object", "bool-dtype", "nullable_bool", "repeats"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be some weird broadcasting behaviour here.
Sorry, this is a total mess. Apparently IndexVariable and Variable now behave differently, and I'm not sure why. |
@@ -945,7 +944,7 @@ def load(self, **kwargs): | |||
-------- | |||
dask.array.compute | |||
""" | |||
self._data = to_duck_array(self._data, **kwargs) | |||
self._data = _maybe_wrap_data(to_duck_array(self._data, **kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should just return the PandasExtensionArray wrapper class but I'm wary of exposing that to users
Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....
whats-new.rst
api.rst