-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional options for pivot_wider and pivot_longer #10
Comments
As a general philosophy, we shouldn't go out of our way to support functionality that isn't directly built into DataFrames.jl or is trivial to add on top. If something is missing in that package, I would recommend we file an issue there. The main functionality I'd like to see is the ability to specify more than one selection separated by commas for |
No urgency on this @rdboyes, but I just want to figure out if we should close this issue. Should we work on adding the ability to select multiple columns separated by commas? Anything else from your end that we should prioritize here? Just want to make sure if no changes planned, that we close this. But I do think there's room for that one tiny improvement before closing. Happy to help with implementation if you'd like me to take a look. |
Something I specifically miss is the ability to use |
Do you happen to know if there is a syntax to achieve this in DataFrames.jl? Need to explore a bit, so a minimal DF example would be helpful. |
Here's an example that I made which you can see is rather contrived. I used an old solution from bkamin to made it work. using DataFrames, Random
Random.seed!(1);
df = DataFrame(
Sales = rand(1:3, 15) |> sort,
Label1 = rand('A':'B', 15) .|> Symbol,
Label2 = rand('Q':'R', 15) .|> Symbol,
Label3 = rand('E':'F', 15) .|> Symbol
)
# Number of rows keep Sales
unstack(
combine(
groupby(
select(df, :Sales, [:Label1, :Label2, :Label3] => ByRow(Symbol) => :Label),
[:Sales, :Label]
), nrow
), :Label, :nrow
)
# Row │ Sales BQE AQE ARF ARE AQF BRF BQF
# │ Int64 Int64? Int64? Int64? Int64? Int64? Int64? Int64?
# ────┼─────────────────────────────────────────────────────────────────────
# 1 │ 1 1 1 1 1 1 missing missing
# 2 │ 2 2 missing 1 missing 1 1 missing
# 3 │ 3 missing missing missing missing 1 2 2
# Sum over Sales
unstack(
combine(
groupby(
select(df, :Sales, [:Label1, :Label2, :Label3] => ByRow(Symbol) => :Label),
[:Label]
), :Sales => sum
), :Label, :Sales_sum
)
# Row │ BQE AQE ARF ARE AQF BRF BQF
# │ Int64? Int64? Int64? Int64? Int64? Int64? Int64?
# ─────┼────────────────────────────────────────────────────────
# 1 │ 5 1 3 1 6 8 6 |
As implemented,
@pivot_wider
and@pivot_longer
only support a small number of the options which are supported by their tidyverse counterparts pivot_longer and pivot_wider. The additional options are ones that I personally rarely use and/or don't have clear analogs instack()
andunstack()
. I'm interested to hear if there are specific options that others get a lot of use out of that should be prioritized.The text was updated successfully, but these errors were encountered: