-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: sum() got an unexpected keyword argument 'skipna' #29481
Comments
Somewhat interesting but this gives a different error on master: >>> df.groupby(pd.Series(['a', 'a', 'b', 'b', 'b']), axis=1).agg('sum', skipna=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/williamayd/clones/pandas/pandas/core/groupby/generic.py", line 880, in aggregate
result, how = self._aggregate(func, _level=_level, *args, **kwargs)
File "/Users/williamayd/clones/pandas/pandas/core/base.py", line 330, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "/Users/williamayd/clones/pandas/pandas/core/base.py", line 281, in _try_aggregate_string_function
return f(*args, **kwargs)
File "/Users/williamayd/clones/pandas/pandas/core/groupby/groupby.py", line 1356, in f
return self._cython_agg_general(alias, alt=npfunc, **kwargs)
TypeError: _cython_agg_general() got an unexpected keyword argument 'skipna' |
Though this might be correct now; |
I guess you could reasonably expect it to work like |
Hi everyone ! This issue hadn't had interactions for too long. Is it still relevant ? |
Hi all, cnt = df.groupby(groupby_cols).sum(skipna=False)[prop_cols] _cython_agg_general() got an unexpected keyword argument 'skipna' It was working perfectly fine until I installed the libraries in a new virtual env. Requirement is that I have a column in the dataframe which has all NaNs, and I don't want them to be ignored after group by clause. I want NaNs to be replicated as NaNs in the result object. Is this a known issue and got introduced recently? If so, can you please tell me if there's any fix that I can install. |
Same thing here with 1.0.3 but I think the skipna argument has been removed from the underlying groupby median/sum etc and missing values are just always excluded: self = <pandas.core.groupby.generic.SeriesGroupBy object at 0x7f9a1f720438>, kwargs = {'skipna': True}
@Substitution(name="groupby")
@Appender(_common_see_also)
def median(self, **kwargs):
"""
Compute median of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex
Returns
-------
Series or DataFrame
Median of values within each group.
"""
return self._cython_agg_general(
"median",
alt=lambda x, axis: Series(x).median(axis=axis, **kwargs),
> **kwargs,
)
E TypeError: _cython_agg_general() got an unexpected keyword argument 'skipna' But what if skipna was only included in the kwargs for |
I can confirm that I got the same error when I tried to groupby dataframe by columns (one of them contains nan values), and than to find maximum of series "Lp". df.groupby([columns_but_one_of_them_contains_nans]).Lp.max(skipna=False) returned TypeError: _cython_agg_general() got an unexpected keyword argument 'skipna' pandas 1.0.3 |
I confirm the same. All NaN values are picked up as 0. This is useless when manipulating data for academic research. For instance,
This results in 0 Meanwhile, when doing the same with skipna=False:
This results in NaN which is the desired output when calculating means. However when attempting the same within the groupby function:
the sum always returns 0 and there is no option of skipping NaN values. These 0 values skew the means and standard deviations resulting in wrong figures. When working with a huge amount of data we realised that the results of our study did not make sense, On further investigation I discovered this bug within pandas. I fear that several others may have unknowingly reported inaccurate figures when manipulating data with pandas data frames. So this bug is very much relevant. |
Definitely still an issue
This will return null values whenever it encounters missing values in the thing it is summing. However, I have found that this method is far slower than a comparably The problem might actually be due to the casting back of All I know is that
returns
returns 17. |
This is a frustrating shortcoming of the groupby.sum() function. But since the mean is just the sum of values divided by the number of values, one alternative is to just multiply the groupby.mean() result with the groupby.count().
The mean() returns NaN when all values in group are NaN and count() returns '0' when all values are NaN, and 0*np.nan returns NaN so their product returns a groupby.sum result that has correct sums but maintains NaN values where all values in a group are NaN. Not sure how much slower this is than a simple groupby.sum(), however... |
on master code sample in OP gives
will update title to make issue more discoverable |
I noticed the same issue with ...groupby.median(skipna=True). I checked several versions. It works with pandas 0.25.3 and fails since pandas 1.0.0. I wonder if that was intended because the API code and doc changed from 0.25.3 to 1.1.0: pandas 0.25.3 @Substitution(name="groupby")
@Appender(_common_see_also)
def median(self, **kwargs):
"""
Compute median of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex
Returns
-------
Series or DataFrame
Median of values within each group.
""" pandas 1.1.0 @Substitution(name="groupby")
@Appender(_common_see_also)
def median(self, numeric_only=True):
"""
Compute median of groups, excluding missing values.
For multiple groupings, the result index will be a MultiIndex
Parameters
----------
numeric_only : bool, default True
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data.
Returns
-------
Series or DataFrame
Median of values within each group.
""" |
The underlying issue here is that the It might be that before this keyword was ignored and recently started to raise, but it never actually worked (or was never documented). The improvement to add |
Duplicate of #15675 |
Code Sample, a copy-pastable example if possible
Problem description
The above call to
agg
givesThis is, because here:
pandas/pandas/core/groupby/groupby.py
Line 1376 in 67ee16a
we are trying to access a new column name (
'a'
) in the original DataFrame.It only occurs, when no
_cython_agg_general
is possible, e.g., when keyword argumentskipna
is given toagg
. Withoutskipna
argument the expected output below will be produced.Expected Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: