Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add a "dense groupby" implementation #17754

Open
GregoryKimball opened this issue Jan 15, 2025 · 0 comments
Open

[FEA] Add a "dense groupby" implementation #17754

GregoryKimball opened this issue Jan 15, 2025 · 0 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Jan 15, 2025

Is your feature request related to a problem? Please describe.
As of 25.02, the implementation for hash-based groupby aggregations tracks partial aggregations in a table that is sized to the double the row count of the input table. Once all of the input rows have been processed, the implementation completes a compaction step to extract the non-empty entries. If the number of distinct groups is smaller than the number of rows, this approach leads to excess memory usage as well as an additional memory copy.

Describe the solution you'd like
Introduce a hash table that tracks an output index with a key hash value. When a new group is found, increment a "write counter". When processing the aggregation kinds, update the partial aggregations at the output index for that group.

With this idea, the output data will not require another gather. The output data will be created in contiguously and (probably) available for output column creation without another copy. The cost would be some additional atomics pressure when cardinality is high. It may be that "sparse" partial aggregation are preferred if the cardinality is close to the row count (needs confirmation).

Additional context

The current setup for Keys, Values, and Aggregations uses a Struct of Arrays (SoA) format, leading to numerous random memory accesses. Consider exploring a conversion from SoA to Array of Structs (AoS).

There is an internal reference for NVIDIA internal developers here. If you an external developer and would like to learn more, please contact [email protected].

@GregoryKimball GregoryKimball added the feature request New feature or request label Jan 15, 2025
@GregoryKimball GregoryKimball added the libcudf Affects libcudf (C++/CUDA) code. label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

1 participant