Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: enable skipna on groupby reduction ops #43671

Closed
wants to merge 5 commits into from
Closed

ENH: enable skipna on groupby reduction ops #43671

wants to merge 5 commits into from

Conversation

geoffrey-eisenbarth
Copy link
Contributor

@mzeitlin11 Any help you can provide on what tests I should add, or where to start digging for similar GroupBy tests would be appreciated. As stated in PR #41399, I'm not super familiar with Cython, so hoping that the previous contributors (stale) PR is close to what's needed (I suppose the tests will tell us).

Thanks!

@mzeitlin11 mzeitlin11 changed the title Skipna ENH: enable skipna on groupby reduction ops Sep 21, 2021
@mzeitlin11
Copy link
Member

Thanks for reheating this @geoffrey-eisenbarth! Might not have much time this week, but will look deeper at the cython and testing when I can. For now, I'd recommend fixing up the easier issues like precommit first.

With respect to testing, I'd take a look at the tests we have for groupby ops which do accept skipna (like any/all, the cumulative methods) and try to ensure similar coverage (as in make sure the new possibility skipna=False is covered for different types). Can look at tests in pandas/tests/groupby for inspiration. Some testing is nicely organized, some testing is stuffed in places like pandas/tests/groupby/test_functions.

Also, until we add tests, probably best to put the pr in draft state so others know it's not fully ready for review

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial syntax comments, will take a more thorough look at logic when I can

@@ -709,6 +723,11 @@ def group_mean(floating[:, ::1] out,
t = sumx[lab, j] + y
compensation[lab, j] = t - sumx[lab, j] - y
sumx[lab, j] = t
# don't skip nan
elif skipna == False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skipna is missing in the function arguments, causing a build failure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should be elif not skipna

@@ -603,6 +612,10 @@ def group_prod(floating[:, ::1] out,
if val == val:
nobs[lab, j] += 1
prodx[lab, j] *= val
# don't skip nan
elif skipna == False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif skipna == False:
elif not skipna:

@@ -555,6 +559,10 @@ def group_add(add_t[:, ::1] out,
t = sumx[lab, j] + y
compensation[lab, j] = t - sumx[lab, j] - y
sumx[lab, j] = t
# don't skip nan
elif skipna == False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif skipna == False:
elif not skipna:

@@ -530,6 +531,9 @@ def group_add(add_t[:, ::1] out,
else:
t = sumx[lab, j] + val
sumx[lab, j] = t
elif skipna == False:
# NOTE: Does this case need to be considered?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. if skipna is False and not checknull(val) (L524 above) then we sumx[lab, j] needs to be incremented by val (so will either become NaN or raise)

@geoffrey-eisenbarth geoffrey-eisenbarth marked this pull request as draft September 21, 2021 23:54
@geoffrey-eisenbarth
Copy link
Contributor Author

Is CircleCI a new addition to the pandas contribution flow? I don't recall running into this when I worked on PR 41321.

When I click on the details, it says I need a configuration specified for this project. Is that something I need to do on my own?

@mzeitlin11
Copy link
Member

Is CircleCI a new addition to the pandas contribution flow? I don't recall running into this when I worked on PR 41321.

When I click on the details, it says I need a configuration specified for this project. Is that something I need to do on my own?

If you merge master it should pick up the config and fix this failure

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 29, 2021
@jbrockmendel
Copy link
Member

@geoffrey-eisenbarth planning to pick this back up? it looked promising

@geoffrey-eisenbarth
Copy link
Contributor Author

@jbrockmendel I was hoping to, since I rely on the old sum(level=level, skipna=False) syntax quite a bit. I know I let it fall behind and now two days ago was the birth my first child, so now it feels like I have zero time, but I will try to merge with master today to fix the CircleCI issues that came up (I'm not sure if CircleCI is new, but my previous contribution didn't seem to involve it) and see where that takes us.

@jbrockmendel
Copy link
Member

No sweat, totally reasonable to have higher priorities under the circumstances.

Not sure about the circleci bit; often times I re-push and hope for CI issues to resolve themselves.

@pep8speaks
Copy link

Hello @geoffrey-eisenbarth! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 340:1: E305 expected 2 blank lines after class or function definition, found 1
Line 340:3: E227 missing whitespace around bitwise or shift operator
Line 340:7: E225 missing whitespace around operator
Line 341:5: E113 unexpected indentation
Line 388:3: E225 missing whitespace around operator
Line 388:5: E225 missing whitespace around operator
Line 388:7: E225 missing whitespace around operator
Line 389:7: E225 missing whitespace around operator

@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

would likely take this, but needs to be passing an fully code-reviewed. closing as stale.

@jreback jreback closed this Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: enable skipna on groupby reduction ops
5 participants