(fix): extension array indexers #9671

ilan-gold · 2024-10-24T15:37:01Z

Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....

Closes #Failure in pandas TestDataFrameToXArray.test_to_xarray_index_types #9661
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

…ocessing, raise now early

…_ref_date

…o fix mypy

…t resolution, fix code and tests to allow this

for more information, see https://pre-commit.ci

…-resolution

… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)

…rray` series creating an extension array when `.array` is accessed

ilan-gold · 2025-03-13T10:42:16Z

@dcherian @benbovy @Illviljan anything left here?

benbovy

Just left a few comments and suggestions. I didn't tested it, though.

xarray/core/indexes.py

benbovy · 2025-03-13T12:54:16Z

xarray/core/indexing.py

    ) -> np.ndarray:
        if dtype is None:
            dtype = self.dtype
+        if pd.api.types.is_extension_array_dtype(dtype):


Maybe this would be cleaner?

if dtype is None and is_valid_numpy_dtype(self.dtype): dtype = self.dtype

This will just let numpy set the appropriate dtype when coercing the pandas.Index.

Just a quick check that default output dtypes make sense for pd.CategoricalIndex and pd.PeriodIndex:

>>> import numpy as np >>> import pandas as pd >>> np.__version__ '2.0.2 >>> pd.__version__ '2.2.3' >>> cidx = pd.CategoricalIndex(["a"]) >>> np.asarray(cidx.values, dtype=None).dtype dtype('O') >>> cidx2 = pd.CategoricalIndex([1]) >>> np.asarray(cidx2.values, dtype=None).dtype dtype('int64') >>> pidx = pd.PeriodIndex([2022], freq="Y") >>> np.asarray(pidx.values, dtype=None).dtype dtype('O')

xarray/core/indexing.py

…y into ig/fix_extension_indexer

xarray/core/indexing.py

Co-authored-by: Benoit Bovy <[email protected]>

dcherian · 2025-03-29T20:41:40Z

xarray/tests/test_groupby.py

@@ -1118,7 +1118,8 @@ def test_groupby_math_nD_group() -> None:
    expected = da.isel(x=slice(30)) - expanded_mean
    expected["labels"] = expected.labels.broadcast_like(expected.labels2d)
    expected["num"] = expected.num.broadcast_like(expected.num2d)
-    expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data[idxr])
+    # mean.num2d_bins.data is a pandas IntervalArray so needs to be put in `numpy` to allow indexing
+    expected["num2d_bins"] = (("x", "y"), mean.num2d_bins.data.to_numpy()[idxr])


This is technically backwards-incompatible, but an improvement IMO. Just noting in case someone looks this up in the future.

Before:

num2d_bins mean.num2d_bins <xarray.DataArray 'num2d_bins' (num2d_bins: 2)> Size: 16B array([Interval(0, 4, closed='right'), Interval(4, 6, closed='right')], dtype=object) Coordinates: * num2d_bins (num2d_bins) object 16B (0, 4] (4, 6]

After:

ipdb> mean.num2d_bins mean.num2d_bins <xarray.DataArray 'num2d_bins' (num2d_bins: 2)> Size: 16B array([Interval(0, 4, closed='right'), Interval(4, 6, closed='right')], dtype=object) Coordinates: * num2d_bins (num2d_bins) interval[int64, right] 16B (0, 4] (4, 6]

dcherian · 2025-03-29T21:17:18Z

xarray/namedarray/core.py

@@ -834,6 +834,7 @@ def chunk(
        if chunkmanager.is_chunked_array(data_old):
            data_chunked = chunkmanager.rechunk(data_old, chunks)  # type: ignore[arg-type]
        else:
+            ndata: duckarray[Any, Any]


I removed the pandas-specific code, I'm not sure we should do that, we might as well just ask the user to cast.

* main: Vendor pandas to xarray conversion tests (pydata#10187) Fix: Correct axis labelling with units for FacetGrid plots (pydata#10185) Use explicit repo name in upstream wheels (pydata#10181) DOC: Update docstring to reflect renamed section (pydata#10180)

dcherian · 2025-03-29T22:16:07Z

xarray/tests/test_pandas_to_xarray.py

@@ -104,17 +104,11 @@ def index_flat(request):
    index fixture, but excluding MultiIndex cases.
    """
    key = request.param
+    if key in ["bool-object", "bool-dtype", "nullable_bool", "repeats"]:


there seems to be some weird broadcasting behaviour here.

dcherian · 2025-03-29T22:41:05Z

Sorry, this is a total mess. Apparently IndexVariable and Variable now behave differently, and I'm not sure why.

dcherian · 2025-03-29T22:41:50Z

xarray/core/variable.py

@@ -945,7 +944,7 @@ def load(self, **kwargs):
        --------
        dask.array.compute
        """
-        self._data = to_duck_array(self._data, **kwargs)
+        self._data = _maybe_wrap_data(to_duck_array(self._data, **kwargs))


maybe we should just return the PandasExtensionArray wrapper class but I'm wary of exposing that to users

kmuehlbauer and others added 30 commits October 18, 2024 07:31

implement default_precision_timestamp, refactor coding/times.py and c…

7b5f323

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

align tests with new time resolution behaviour

8784f33

timedelta decoding, fsspec handling

b45ab23

fixes in coding/times.py

39086ef

add docs on time coding

df49a40

attempt fixing doc tests

adb8ca3

fix issue where out-of-bounds floating point values slipped in the pr…

266b1ed

…ocessing, raise now early

convert to UTC first before stripping of tz in _unpack_time_units_and…

6d5f13b

…_ref_date

reorganize pandas compatibility code, remove unneeded code, attempt t…

5d68bfe

…o fix mypy

another attempt to finally fix mypy

07bba69

refactor out _check_date_is_after_shift

6e7f0bb

refactor out _maybe_strip_tz_from_timestamp

b4a49bb

more refactoring in coding.times.py

2e1ff4f

more refactoring in coding.times.py

d5a7da0

minor fix in time-coding.rst

821b68d

set default resolution to "s", which actually means, use pandas lowes…

d066edf

…t resolution, fix code and tests to allow this

Add section for default units, fix options

ed22da1

attempt to fix typing

8bf23f4

attempt to fix typing

c3a2b39

fix scalar datetime/timedelta

3c44aed

fix user docs

48be73a

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ac9983

for more information, see https://pre-commit.ci

Fix variable tests, mostly datetime/timedelta is inittialized with us…

d86ad04

…-resolution

revert changes in _possible_convert_objects, this needs to be checked…

b5d0795

… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)

fix doc link

60324f0

(fix): allow all extension array data types in pandas adapters

c2bc4df

(fix): dataframes have no array attr

84569bc

(fix): allow chunked numpy extension arrays because of `test_pandas_a…

90e390d

…rray` series creating an extension array when `.array` is accessed

(fix): dtypes for PandasIndex

7c32bd0

(chore): remove test for unnecessary conversion

795ecf6

benbovy mentioned this pull request Mar 12, 2025

Allow DataArray to hold cell boundaries as coordinate variables #1475

Open

Merge branch 'main' into ig/fix_extension_indexer

91e31b1

benbovy reviewed Mar 13, 2025

View reviewed changes

ilan-gold added 5 commits March 21, 2025 12:01

(fix): use coord_type check

5809b2f

(chore): clean up numpy dtype handling

a22ff34

Merge branch 'ig/fix_extension_indexer' of github.com:ilan-gold/xarra…

d41e5ee

…y into ig/fix_extension_indexer

Merge branch 'main' into ig/fix_extension_indexer

ef54645

(chore): try casting earlier?

5c64310

benbovy reviewed Mar 21, 2025

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

benbovy reviewed Mar 21, 2025

View reviewed changes

xarray/core/indexing.py Outdated Show resolved Hide resolved

ilan-gold and others added 3 commits March 21, 2025 17:04

Update xarray/core/indexing.py

d9fc4ca

Co-authored-by: Benoit Bovy <[email protected]>

Update xarray/core/indexing.py

7169fb1

Co-authored-by: Benoit Bovy <[email protected]>

Merge branch 'main' into ig/fix_extension_indexer

ea5c066

ilan-gold requested a review from benbovy March 27, 2025 14:45

tomvothecoder mentioned this pull request Mar 27, 2025

[Exploration]: Including dataarrays with our current dataset API model (#671 discussion) xCDAT/xcdat#725

Open

dcherian reviewed Mar 29, 2025

View reviewed changes

dcherian added 2 commits March 29, 2025 14:48

Small cleanup

2e1a6ae

Skip dask tests with pandas index

4a5cf1d

dcherian reviewed Mar 29, 2025

View reviewed changes

dcherian added 5 commits March 29, 2025 15:20

fix test, typing

44a05a4

fix

c25c824

Add nbytes

3eb3372

Update pandas_to_xarray tests

fbae574

dcherian reviewed Mar 29, 2025

View reviewed changes

dcherian added 2 commits March 29, 2025 16:27

Revert nbytes

8eb0685

One more fix.

516aa68

dcherian reviewed Mar 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix): extension array indexers #9671

(fix): extension array indexers #9671

ilan-gold commented Oct 24, 2024

ilan-gold commented Mar 13, 2025

benbovy left a comment

benbovy Mar 13, 2025

dcherian Mar 29, 2025

dcherian Mar 29, 2025

dcherian Mar 29, 2025

dcherian commented Mar 29, 2025

dcherian Mar 29, 2025

(fix): extension array indexers #9671

Are you sure you want to change the base?

(fix): extension array indexers #9671

Conversation

ilan-gold commented Oct 24, 2024

ilan-gold commented Mar 13, 2025

benbovy left a comment

Choose a reason for hiding this comment

benbovy Mar 13, 2025

Choose a reason for hiding this comment

dcherian Mar 29, 2025

Choose a reason for hiding this comment

dcherian Mar 29, 2025

Choose a reason for hiding this comment

dcherian Mar 29, 2025

Choose a reason for hiding this comment

dcherian commented Mar 29, 2025

dcherian Mar 29, 2025

Choose a reason for hiding this comment