-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The intermediate type of collect_list/collect_set isn't compatible with Spark #12023
Comments
Thank you for working on this @NEUpanning. This will involves changes on Spark functions cc @rui-mo |
In the intermediate step of the |
@zhztheplayer @rui-mo
However, we may not be able to enumerate all types. What do you think? Thanks! |
Bug description
The intermediate type of Spark's collect_list/collect_set is BINARY. The intermediate data type for Velox's collect_list/collect_set is ARRAY, which is incompatible with BINARY. The current workaround implemented in Gluten incurs some issues, which include gluten#8227 and gluten#8184.
A complete solution involves changing the intermediate type of Velox's collec_list/collect_set to VARBINARY and using UnsafeArrayData format to do the SerDe.
System information
/
Relevant logs
No response
The text was updated successfully, but these errors were encountered: