-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 75MB #4547
[#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 75MB #4547
Conversation
…aimon libs size from 222MB to 156MB
…ateDependenciesSize
…g-binary-package-size' into apache#4545-Shrink-Paimon-catalog-binary-package-size
@caican00 can you please help to review this PR? |
|
@caican00 can you please help on this, to make this move fast. |
okay. |
exclude("org.apache.hive") | ||
exclude("org.apache.hbase") | ||
exclude("it.unimi.dsi") | ||
exclude("org.apache.hadoop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiuQhahah here we can not exclude all the hadoop dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can only exclude hadoop yarn dependencies
…imon-catalog-binary-package-size
…g-binary-package-size' into apache#4545-Shrink-Paimon-catalog-binary-package-size
@FANNG1 could you help re-trigger this pipeline? Thanks |
done |
@LiuQhahah could you please update the pr title for the real lib size? |
Ok The final size is 74M
|
implementation(libs.guava) | ||
implementation(libs.hadoop2.common) { | ||
exclude("com.github.spotbugs") | ||
exclude("com.sun.jersey") | ||
exclude("javax.servlet") | ||
} | ||
implementation(libs.hadoop2.hdfs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiuQhahah gravitino paimon catalog supports FilesystemCatalog as its backend catalog, and therefore we can not remove the hdfs dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and could you please do a basic test locally? such as built successfully and ran through the unit tests. thanks!
LGTM @jerryshao @FANNG1 do you have further comments? |
@LiuQhahah can you please list all the jars and paste here on Github for Paimon catalog? You can use |
@FANNG1 would you please help to review this? |
Hi @jerryshao
|
Hi, @LiuQhahah , I did some extra work to exclude
|
Can we remove these? |
Why the paimon format jar is so big, is it correct? |
Also, why do we need hdfs jar? I assume the hdfs client jar should be enough. |
it's used by Paimon internally, we'd better to keep it.
Yes, this could be removed |
Paimon format jar packages the dependences in the jar, @caican00 do you know any other alternative jars? |
Kerberos relies on the default HDFS configuration, removing hdfs jar may cause Kerberos operation failed, cc @yuqi1129 |
Yeah, it will make Paimon Kerberos-related tests fail. Let's leave it as is now. |
I have tested it and we can make hdfs as testImplementation. |
The paimon catalog explicitly calls the code in the paimon-format module, so it cannot be replaced with other jars. I exclude more dependencies and ran through the integration tests. And deployed it on physical machine for testing. I have no permission to push code to the repo, please help update the pr.
|
What changes were proposed in this pull request?
remove some unnecessary dependencies
Why are the changes needed?
reduce catalog-lakehouse-paimon libs
Fix: #4545
Does this PR introduce any user-facing change?
No
How was this patch tested?
CI passed