Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sortrows and sortcols to unstack #3395

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add sortrows and sortcols to unstack #3395

wants to merge 1 commit into from

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 21, 2023

Fixes #3334

Tests are missing, but I would like to get feedback if we like the design before I move forward.

This is useful as the order of appearance is often not wanted. And it is impossible to ensure wanted sort order of rows and columns in the source. Finally, especially for columns - it is especially useful if you have ordered categorical values to present (so you can sort data e.g. by day of week, which is lost after you created the output since column names are Symbol then)

@bkamins bkamins added this to the 1.7 milestone Oct 21, 2023
Copy link

@storopoli storopoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the docstring.
As ever, amazing work!

@@ -259,6 +260,14 @@ Row and column keys are ordered in the order of their first appearance.
time). Whether or not tasks are actually spawned and their number are
determined automatically. Set to `false` if `combine` requires serial
execution or is not thread-safe.
- `sortrows`: the order of rows in the output table; all values accepted by

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `sortrows`: the order of rows in the output table; all values accepted by
- `sortrows`: the order of rows in the resulting table; all values accepted by

I prefer something else than "output", I find it a little ambiguous.

- `sortrows`: the order of rows in the output table; all values accepted by
`sort` keyword argument in `groupby` passed the `rowkeys` for grouping are supported;
`false` by default (rows are ordered following the first appereance order).
- `sortcols`: the order of columns in the output table; all values accepted by

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `sortcols`: the order of columns in the output table; all values accepted by
- `sortcols`: the order of columns in the resulting table; all values accepted by

I prefer something else than "output", I find it a little ambiguous.

@@ -259,6 +260,14 @@ Row and column keys are ordered in the order of their first appearance.
time). Whether or not tasks are actually spawned and their number are
determined automatically. Set to `false` if `combine` requires serial
execution or is not thread-safe.
- `sortrows`: the order of rows in the output table; all values accepted by
`sort` keyword argument in `groupby` passed the `rowkeys` for grouping are supported;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "passed" means? I am confused.

`sort` keyword argument in `groupby` passed the `rowkeys` for grouping are supported;
`false` by default (rows are ordered following the first appereance order).
- `sortcols`: the order of columns in the output table; all values accepted by
`sort` keyword argument in `groupby` passed `colkey` for grouping are supported;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "passed" means? I am confused.

Comment on lines +269 to +270
Note that the ordering is done on the source data (not on column final column names
that can be potentially changed by the function passed in the `renamecols` keyword argument).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also confusing.
"column final column names"

@bkamins
Copy link
Member Author

bkamins commented Nov 30, 2023

@storopoli - thank you for the comments.

@nalimilan - before I move forward could you please comment if you like this addition. (I would want to have a decision on this PR and a few more and make a release in 1Q2024 of DataFrames.jl, especially that Julia 1.10 is coming and we need to clean up Project.toml in light of the change how standard library packages will be handled there)

@bkamins bkamins modified the milestones: 1.7, 1.x Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add sotring support for unstack
2 participants