-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement xxhash algorithms as part of the expression API #14044
Comments
You could also implement this as a user defined function perhaps |
Hi, thanks for the reply. That is true, although, my concern comes from this other issue raised in delta-rs project (link below), since I need to use this hash operation in a Delta merge operation. Unless I change the codebase for the delta-rs project, I'd be unable to create an UDF (not sure if there's a way to create and register it in the delta-rs context in python, sounds complicated) |
Can I take this up? I shall add the new functions in a new file in crypto/xxhas.rs similar to the crypto/sha224.rs file and import them in the crypto/mod.rs file. And then add some tests. Do tell me if I'm missing something! |
take |
@Spaarsh sounds like a good plan |
Perhaps consider adding a non cryptographic module and add wyhash as well (here some of the crates I've used for similar usecase but in polars: https://github.com/ion-elgreco/polars-hash/blob/main/polars_hash%2FCargo.toml) |
Sure. I'll take a look at it and get back to you in case of any doubts! Thanks! |
@HectorPascual is this the output that you expect?
|
Yes, that looks good! |
@alamb @HectorPascual I would like to have your input on this:
These are the query results:
If this is alright, then I will add the tests for these functions.
@alamb should I open a new issue for adding wyhash and the other functions? Or they can be added along with this PR? |
Looks good, I can try tomorrow hashing the same inputs with the Python xxhash module and double check the output of the results. @Spaarsh Let me gather some feedback from colleagues on your open concerns and come back later with details. |
Hi @Spaarsh, The hashes match to the python module : In regards to your inputs :
2-3. On the seed side, for the python module the default seed is 0, see : https://github.com/ifduyue/python-xxhash#:~:text=An%20optional%20seed%20(default%20is%200)%20can%20be%20used%20to%20alter%20the%20result%20predictably%3A Thanks! |
Thanks @HectorPascual! I'm opening PR from here on for transperancy's sake! |
Is your feature request related to a problem or challenge?
I am currently in need of using datafusion SQL query engine (through delta-rs merge operation) to hash with a specific hashing algorithm.
https://xxhash.com/
https://github.com/Cyan4973/xxHash
Rust implementation : https://crates.io/crates/twox-hash
Describe the solution you'd like
I'd like the hashing functions xxhash32 and xxhash64 to be part of datafusion expression API so the functions can be used in SQL context for hashing data.
The text was updated successfully, but these errors were encountered: