API / COW: ensure every new Series/DataFrame also has new (shallow copy) index #53699
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The issue at #53529 was about the index still sharing mutable state (because of being the same object) in case of getting a series out of a DataFrame (even with CoW turned on):
and thus if you would modify for example the index name of the Series, also the DataFrame gets updated (this PR ensures the above return value is False).
I think under the CoW rules, it makes sense to also ensure mutation of such index attributes don't propagate, similarly to mutating values, by ensuring we use a shallow copy of the Index whenever a new DataFrame/Series object is being returned from some operation or method.
Now, this goes quite a bit further than just the typical indexing operation above. To start, methods that return a shallow copy under CoW should also do this, as a start the
copy()
method itself:But even for new objects that actually don't share data even with CoW, we do share the index / columns:
I think that in the CoW spirit that every new DataFrame/Series object should be independent, none of those cases should ever share the index and always use shallow copies (so essentially
df1.index is df2.index
can only be true ifdf1 is df2
, i.e. if we have identical objects).doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.