Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 75MB #4547

Merged

Conversation

LiuQhahah
Copy link
Contributor

What changes were proposed in this pull request?

remove some unnecessary dependencies

Why are the changes needed?

reduce catalog-lakehouse-paimon libs

Fix: #4545

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed

@jerryshao
Copy link
Contributor

@caican00 can you please help to review this PR?

@caican00
Copy link
Collaborator

caican00 commented Aug 30, 2024

image
@LiuQhahah we can also exclude these dependencies now.
for hive related, we can add them when supporting hive backend catalog later.

@jerryshao
Copy link
Contributor

@caican00 can you please help on this, to make this move fast.

@caican00
Copy link
Collaborator

caican00 commented Sep 4, 2024

@caican00 can you please help on this, to make this move fast.

okay.

exclude("org.apache.hive")
exclude("org.apache.hbase")
exclude("it.unimi.dsi")
exclude("org.apache.hadoop")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiuQhahah here we can not exclude all the hadoop dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can only exclude hadoop yarn dependencies

@caican00
Copy link
Collaborator

caican00 commented Sep 5, 2024

@FANNG1 could you help re-trigger this pipeline? Thanks

@yuqi1129
Copy link
Contributor

yuqi1129 commented Sep 5, 2024

@FANNG1 could you help re-trigger this pipeline? Thanks

done

@caican00
Copy link
Collaborator

caican00 commented Sep 8, 2024

reduce catalog-lakehouse-paimon libs size from 222MB to 156MB

@LiuQhahah could you please update the pr title for the real lib size?

@LiuQhahah
Copy link
Contributor Author

reduce catalog-lakehouse-paimon libs size from 222MB to 156MB

@LiuQhahah could you please update the pr title for the real lib size?

Ok

The final size is 74M

$ du -sh .                   
 74M    .

~/workspace/gravitino/distribution/package/catalogs/lakehouse-paimon/libs on  #4545-Shrink-Paimon-catalog-binary-package-size! ⌚ 22:48:21

@LiuQhahah LiuQhahah changed the title [#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 156MB [#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 74MB Sep 8, 2024
implementation(libs.guava)
implementation(libs.hadoop2.common) {
exclude("com.github.spotbugs")
exclude("com.sun.jersey")
exclude("javax.servlet")
}
implementation(libs.hadoop2.hdfs) {
Copy link
Collaborator

@caican00 caican00 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiuQhahah gravitino paimon catalog supports FilesystemCatalog as its backend catalog, and therefore we can not remove the hdfs dependency.

Copy link
Collaborator

@caican00 caican00 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and could you please do a basic test locally? such as built successfully and ran through the unit tests. thanks!

@caican00
Copy link
Collaborator

caican00 commented Sep 19, 2024

LGTM @jerryshao @FANNG1 do you have further comments?

@jerryshao
Copy link
Contributor

@LiuQhahah can you please list all the jars and paste here on Github for Paimon catalog? You can use ./gradlew compileDistribution to get a distribution, then check the paimon catalog dir to list all the jars.

@jerryshao
Copy link
Contributor

@FANNG1 would you please help to review this?

@LiuQhahah
Copy link
Contributor Author

@LiuQhahah can you please list all the jars and paste here on Github for Paimon catalog? You can use ./gradlew compileDistribution to get a distribution, then check the paimon catalog dir to list all the jars.

Hi @jerryshao
Sorry for the late reply.

~/workspace/gravitino/distribution/package/catalogs/lakehouse-paimon/libs on  #4545-Shrink-Paimon-catalog-binary-package-size! ⌚ 18:53:02
$ ls -alth .               
total 200128
-rw-r--r--@   1 qiang_liu  staff   643K Sep 18 21:30 commons-lang3-3.14.0.jar
drwxr-xr-x@ 114 qiang_liu  staff   3.6K Sep 18 21:30 .
-rw-r--r--@   1 qiang_liu  staff   2.8M Sep 18 21:30 hadoop-yarn-api-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   2.4K Sep 18 21:30 javax.inject-1.jar
-rw-r--r--@   1 qiang_liu  staff    24K Sep 18 21:30 commons-daemon-1.0.13.jar
-rw-r--r--@   1 qiang_liu  staff   762K Sep 18 21:30 jackson-mapper-asl-1.9.13.jar
-rw-r--r--@   1 qiang_liu  staff    19K Sep 18 21:30 jsr305-3.0.2.jar
-rw-r--r--@   1 qiang_liu  staff   325K Sep 18 21:30 log4j-api-2.22.0.jar
-rw-r--r--@   1 qiang_liu  staff   119K Sep 18 21:30 jackson-datatype-jsr310-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff   1.0M Sep 18 21:30 leveldbjni-all-1.8.jar
-rw-r--r--@   1 qiang_liu  staff   1.5M Sep 18 21:30 jackson-databind-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff   4.1M Sep 18 21:30 hadoop-hdfs-client-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff    75K Sep 18 21:30 jackson-annotations-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff   3.8M Sep 18 21:30 hadoop-common-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   762K Sep 18 21:30 httpclient-4.5.13.jar
-rw-r--r--@   1 qiang_liu  staff   278K Sep 18 21:30 commons-lang-2.6.jar
-rw-r--r--@   1 qiang_liu  staff   527K Sep 18 21:30 jets3t-0.9.0.jar
-rw-r--r--@   1 qiang_liu  staff   122K Sep 18 21:30 hadoop-auth-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   186K Sep 18 21:30 gson-2.2.4.jar
-rw-r--r--@   1 qiang_liu  staff   267K Sep 18 21:30 commons-net-3.1.jar
-rw-r--r--@   1 qiang_liu  staff   276K Sep 18 21:30 jsch-0.1.55.jar
-rw-r--r--@   1 qiang_liu  staff    29K Sep 18 21:30 accessors-smart-1.2.jar
-rw-r--r--@   1 qiang_liu  staff    78K Sep 18 21:30 api-util-1.0.0-M20.jar
-rw-r--r--@   1 qiang_liu  staff   272K Sep 18 21:30 cglib-2.2.1-v20090111.jar
-rw-r--r--@   1 qiang_liu  staff   241K Sep 18 21:30 commons-beanutils-1.9.4.jar
-rw-r--r--@   1 qiang_liu  staff    42K Sep 18 21:30 asm-3.1.jar
-rw-r--r--@   1 qiang_liu  staff   277K Sep 18 21:30 curator-recipes-2.13.0.jar
-rw-r--r--@   1 qiang_liu  staff   2.9M Sep 18 21:30 guava-32.1.3-jre.jar
-rw-r--r--@   1 qiang_liu  staff   2.9M Sep 18 21:30 paimon-shade-guava-30-30.1.1-jre-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   675K Sep 18 21:30 apacheds-kerberos-codec-2.0.0-M15.jar
-rw-r--r--@   1 qiang_liu  staff   1.8M Sep 18 21:30 log4j-core-2.22.0.jar
-rw-r--r--@   1 qiang_liu  staff    16K Sep 18 21:30 api-asn1-api-1.0.0-M20.jar
-rw-r--r--@   1 qiang_liu  staff   1.2M Sep 18 21:30 netty-3.10.6.Final.jar
-rw-r--r--@   1 qiang_liu  staff   215K Sep 18 21:30 xml-apis-1.4.01.jar
-rw-r--r--@   1 qiang_liu  staff   575K Sep 18 21:30 commons-collections-3.2.2.jar
-rw-r--r--@   1 qiang_liu  staff    29K Sep 18 21:30 paranamer-2.3.jar
-rw-r--r--@   1 qiang_liu  staff    66K Sep 18 21:30 paimon-hive-catalog-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff    53K Sep 18 21:30 jackson-dataformat-yaml-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff    18K Sep 18 21:30 java-xmlbuilder-0.4.jar
-rw-r--r--@   1 qiang_liu  staff   694K Sep 18 21:30 guice-3.0.jar
-rw-r--r--@   1 qiang_liu  staff    16K Sep 18 21:30 error_prone_annotations-2.21.1.jar
-rw-r--r--@   1 qiang_liu  staff    20K Sep 18 21:30 audience-annotations-0.5.0.jar
-rw-r--r--@   1 qiang_liu  staff    52K Sep 18 21:30 asm-5.0.4.jar
-rw-r--r--@   1 qiang_liu  staff   345K Sep 18 21:30 log4j-1.2-api-2.22.0.jar
-rw-r--r--@   1 qiang_liu  staff   2.1M Sep 18 21:30 commons-math3-3.6.1.jar
-rw-r--r--@   1 qiang_liu  staff   5.0M Sep 18 21:30 hadoop-hdfs-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   327K Sep 18 21:30 commons-codec-1.11.jar
-rw-r--r--@   1 qiang_liu  staff    63K Sep 18 21:30 slf4j-api-2.0.9.jar
-rw-r--r--@   1 qiang_liu  staff    98K Sep 18 21:30 jsp-api-2.1.jar
-rw-r--r--@   1 qiang_liu  staff   323K Sep 18 21:30 okhttp-2.7.5.jar
-rw-r--r--@   1 qiang_liu  staff    34K Sep 18 21:30 jackson-datatype-jdk8-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff   2.1K Sep 18 21:30 listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
-rw-r--r--@   1 qiang_liu  staff   270K Sep 18 21:30 hadoop-yarn-client-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   994K Sep 18 21:30 commons-compress-1.21.jar
-rw-r--r--@   1 qiang_liu  staff   227K Sep 18 21:30 jackson-core-asl-1.9.13.jar
-rw-r--r--@   1 qiang_liu  staff   204K Sep 18 21:30 commons-io-2.5.jar
-rw-r--r--@   1 qiang_liu  staff   5.0M Sep 18 21:30 paimon-codegen-loader-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   324K Sep 18 21:30 snakeyaml-1.33.jar
-rw-r--r--@   1 qiang_liu  staff    64K Sep 18 21:30 okio-1.6.0.jar
-rw-r--r--@   1 qiang_liu  staff   2.3M Sep 18 21:30 curator-client-2.13.0.jar
-rw-r--r--@   1 qiang_liu  staff   197K Sep 18 21:30 curator-framework-2.13.0.jar
-rw-r--r--@   1 qiang_liu  staff   219K Sep 18 21:30 checker-qual-3.37.0.jar
-rw-r--r--@   1 qiang_liu  staff   117K Sep 18 21:30 json-smart-2.3.jar
-rw-r--r--@   1 qiang_liu  staff   4.0M Sep 18 21:30 netty-all-4.1.50.Final.jar
-rw-r--r--@   1 qiang_liu  staff   1.9M Sep 18 21:30 hadoop-yarn-common-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   2.8M Sep 18 21:30 paimon-shade-jackson-2-2.14.2-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   331K Sep 18 21:30 nimbus-jose-jwt-7.9.jar
-rw-r--r--@   1 qiang_liu  staff    26K Sep 18 21:30 jackson-xc-1.9.13.jar
-rw-r--r--@   1 qiang_liu  staff    15K Sep 18 21:30 xmlenc-0.52.jar
-rw-r--r--@   1 qiang_liu  staff   191K Sep 18 21:30 stax2-api-4.2.1.jar
-rw-r--r--@   1 qiang_liu  staff   4.5K Sep 18 21:30 failureaccess-1.0.1.jar
-rw-r--r--@   1 qiang_liu  staff   4.6K Sep 18 21:30 jcip-annotations-1.0-1.jar
-rw-r--r--@   1 qiang_liu  staff   521K Sep 18 21:30 protobuf-java-2.5.0.jar
-rw-r--r--@   1 qiang_liu  staff   140K Sep 18 21:30 commons-digester-1.8.jar
-rw-r--r--@   1 qiang_liu  staff   331K Sep 18 21:30 gravitino-common-0.7.0-incubating-SNAPSHOT.jar
-rw-r--r--@   1 qiang_liu  staff    40K Sep 18 21:30 commons-cli-1.2.jar
-rw-r--r--@   1 qiang_liu  staff   478K Sep 18 21:30 log4j-1.2.17.jar
-rw-r--r--@   1 qiang_liu  staff   1.3M Sep 18 21:30 paimon-core-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   961K Sep 18 21:30 paimon-shade-caffeine-2-2.9.3-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   766K Sep 18 21:30 gravitino-core-0.7.0-incubating-SNAPSHOT.jar
-rw-r--r--@   1 qiang_liu  staff   890K Sep 18 21:30 zookeeper-3.4.14.jar
-rw-r--r--@   1 qiang_liu  staff   426K Sep 18 21:30 avro-1.7.7.jar
-rw-r--r--@   1 qiang_liu  staff   510K Sep 18 21:30 woodstox-core-5.3.0.jar
-rw-r--r--@   1 qiang_liu  staff    46K Sep 18 21:30 hadoop-annotations-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   102K Sep 18 21:30 jackson-dataformat-csv-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff    15K Sep 18 21:30 spotbugs-annotations-3.1.9.jar
-rw-r--r--@   1 qiang_liu  staff    18K Sep 18 21:30 jackson-jaxrs-1.9.13.jar
-rw-r--r--@   1 qiang_liu  staff   1.9M Sep 18 21:30 snappy-java-1.1.8.3.jar
-rw-r--r--@   1 qiang_liu  staff   321K Sep 18 21:30 httpcore-4.4.13.jar
-rw-r--r--@   1 qiang_liu  staff   103K Sep 18 21:30 jaxb-api-2.2.2.jar
-rw-r--r--@   1 qiang_liu  staff    85K Sep 18 21:30 jline-0.9.94.jar
-rw-r--r--@   1 qiang_liu  staff    61K Sep 18 21:30 gravitino-catalog-lakehouse-paimon-0.7.0-incubating-SNAPSHOT.jar
-rw-r--r--@   1 qiang_liu  staff   9.6K Sep 18 21:30 slf4j-reload4j-1.7.36.jar
-rw-r--r--@   1 qiang_liu  staff    44K Sep 18 21:30 apacheds-i18n-2.0.0-M15.jar
-rw-r--r--@   1 qiang_liu  staff   667K Sep 18 21:30 lz4-java-1.8.0.jar
-rw-r--r--@   1 qiang_liu  staff    23K Sep 18 21:30 stax-api-1.0-2.jar
-rw-r--r--@   1 qiang_liu  staff   326K Sep 18 21:30 reload4j-1.2.19.jar
-rw-r--r--@   1 qiang_liu  staff   1.4M Sep 18 21:30 htrace-core4-4.1.0-incubating.jar
-rw-r--r--@   1 qiang_liu  staff   891K Sep 18 21:30 caffeine-2.9.3.jar
-rw-r--r--@   1 qiang_liu  staff    63K Sep 18 21:30 guice-servlet-3.0.jar
-rw-r--r--@   1 qiang_liu  staff   448K Sep 18 21:30 jackson-core-2.14.2.jar
-rw-r--r--@   1 qiang_liu  staff    20M Sep 18 21:30 paimon-format-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   292K Sep 18 21:30 commons-configuration-1.6.jar
-rw-r--r--@   1 qiang_liu  staff   4.4K Sep 18 21:30 aopalliance-1.0.jar
-rw-r--r--@   1 qiang_liu  staff    62K Sep 18 21:30 activation-1.1.jar
-rw-r--r--@   1 qiang_liu  staff    12K Sep 18 21:30 slf4j-log4j12-1.7.25.jar
-rw-r--r--@   1 qiang_liu  staff   179K Sep 18 21:30 aircompressor-0.21.jar
-rw-r--r--@   1 qiang_liu  staff    27K Sep 18 21:30 log4j-slf4j2-impl-2.22.0.jar
-rw-r--r--@   1 qiang_liu  staff    60K Sep 18 21:30 commons-logging-1.2.jar
-rw-r--r--@   1 qiang_liu  staff   3.8M Sep 18 21:30 paimon-common-0.8.0.jar
-rw-r--r--@   1 qiang_liu  staff   222K Sep 18 21:30 gravitino-api-0.7.0-incubating-SNAPSHOT.jar
-rw-r--r--@   1 qiang_liu  staff   1.5M Sep 18 21:30 hadoop-mapreduce-client-core-2.10.2.jar
-rw-r--r--@   1 qiang_liu  staff   1.3M Sep 18 21:30 xercesImpl-2.12.0.jar
drwxr-xr-x@   4 qiang_liu  staff   128B Sep 18 21:30 ..

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 23, 2024

Hi, @LiuQhahah , I did some extra work to exclude yarn zookeeper curator jars, please help to review, cc @jerryshao @caican00

du -sh distribution/package/catalogs/lakehouse-paimon/libs/* | sort -h
4.0K	distribution/package/catalogs/lakehouse-paimon/libs/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
8.0K	distribution/package/catalogs/lakehouse-paimon/libs/failureaccess-1.0.1.jar
8.0K	distribution/package/catalogs/lakehouse-paimon/libs/jcip-annotations-1.0-1.jar
 12K	distribution/package/catalogs/lakehouse-paimon/libs/slf4j-reload4j-1.7.36.jar
 16K	distribution/package/catalogs/lakehouse-paimon/libs/xmlenc-0.52.jar
 20K	distribution/package/catalogs/lakehouse-paimon/libs/api-asn1-api-1.0.0-M20.jar
 20K	distribution/package/catalogs/lakehouse-paimon/libs/error_prone_annotations-2.21.1.jar
 20K	distribution/package/catalogs/lakehouse-paimon/libs/java-xmlbuilder-0.4.jar
 20K	distribution/package/catalogs/lakehouse-paimon/libs/jsr305-3.0.2.jar
 28K	distribution/package/catalogs/lakehouse-paimon/libs/log4j-slf4j2-impl-2.22.0.jar
 32K	distribution/package/catalogs/lakehouse-paimon/libs/accessors-smart-1.2.jar
 32K	distribution/package/catalogs/lakehouse-paimon/libs/paranamer-2.3.jar
 36K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-datatype-jdk8-2.14.2.jar
 44K	distribution/package/catalogs/lakehouse-paimon/libs/apacheds-i18n-2.0.0-M15.jar
 44K	distribution/package/catalogs/lakehouse-paimon/libs/commons-cli-1.2.jar
 48K	distribution/package/catalogs/lakehouse-paimon/libs/hadoop-annotations-2.10.2.jar
 56K	distribution/package/catalogs/lakehouse-paimon/libs/asm-5.0.4.jar
 56K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-dataformat-yaml-2.14.2.jar
 64K	distribution/package/catalogs/lakehouse-paimon/libs/commons-logging-1.2.jar
 64K	distribution/package/catalogs/lakehouse-paimon/libs/gravitino-catalog-lakehouse-paimon-0.7.0-incubating-SNAPSHOT.jar
 64K	distribution/package/catalogs/lakehouse-paimon/libs/slf4j-api-2.0.9.jar
 68K	distribution/package/catalogs/lakehouse-paimon/libs/paimon-hive-catalog-0.8.0.jar
 76K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-annotations-2.14.2.jar
 80K	distribution/package/catalogs/lakehouse-paimon/libs/api-util-1.0.0-M20.jar
100K	distribution/package/catalogs/lakehouse-paimon/libs/jsp-api-2.1.jar
104K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-dataformat-csv-2.14.2.jar
120K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-datatype-jsr310-2.14.2.jar
120K	distribution/package/catalogs/lakehouse-paimon/libs/json-smart-2.3.jar
124K	distribution/package/catalogs/lakehouse-paimon/libs/hadoop-auth-2.10.2.jar
144K	distribution/package/catalogs/lakehouse-paimon/libs/commons-digester-1.8.jar
180K	distribution/package/catalogs/lakehouse-paimon/libs/aircompressor-0.21.jar
188K	distribution/package/catalogs/lakehouse-paimon/libs/gson-2.2.4.jar
192K	distribution/package/catalogs/lakehouse-paimon/libs/stax2-api-4.2.1.jar
204K	distribution/package/catalogs/lakehouse-paimon/libs/commons-io-2.5.jar
220K	distribution/package/catalogs/lakehouse-paimon/libs/checker-qual-3.37.0.jar
224K	distribution/package/catalogs/lakehouse-paimon/libs/gravitino-api-0.7.0-incubating-SNAPSHOT.jar
228K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-core-asl-1.9.13.jar
244K	distribution/package/catalogs/lakehouse-paimon/libs/commons-beanutils-1.9.4.jar
268K	distribution/package/catalogs/lakehouse-paimon/libs/commons-net-3.1.jar
276K	distribution/package/catalogs/lakehouse-paimon/libs/jsch-0.1.55.jar
280K	distribution/package/catalogs/lakehouse-paimon/libs/commons-lang-2.6.jar
292K	distribution/package/catalogs/lakehouse-paimon/libs/commons-configuration-1.6.jar
324K	distribution/package/catalogs/lakehouse-paimon/libs/httpcore-4.4.13.jar
324K	distribution/package/catalogs/lakehouse-paimon/libs/snakeyaml-1.33.jar
328K	distribution/package/catalogs/lakehouse-paimon/libs/commons-codec-1.11.jar
328K	distribution/package/catalogs/lakehouse-paimon/libs/log4j-api-2.22.0.jar
328K	distribution/package/catalogs/lakehouse-paimon/libs/reload4j-1.2.19.jar
332K	distribution/package/catalogs/lakehouse-paimon/libs/gravitino-common-0.7.0-incubating-SNAPSHOT.jar
332K	distribution/package/catalogs/lakehouse-paimon/libs/nimbus-jose-jwt-7.9.jar
348K	distribution/package/catalogs/lakehouse-paimon/libs/log4j-1.2-api-2.22.0.jar
428K	distribution/package/catalogs/lakehouse-paimon/libs/avro-1.7.7.jar
452K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-core-2.14.2.jar
512K	distribution/package/catalogs/lakehouse-paimon/libs/woodstox-core-5.3.0.jar
524K	distribution/package/catalogs/lakehouse-paimon/libs/protobuf-java-2.5.0.jar
528K	distribution/package/catalogs/lakehouse-paimon/libs/jets3t-0.9.0.jar
576K	distribution/package/catalogs/lakehouse-paimon/libs/commons-collections-3.2.2.jar
644K	distribution/package/catalogs/lakehouse-paimon/libs/commons-lang3-3.14.0.jar
668K	distribution/package/catalogs/lakehouse-paimon/libs/lz4-java-1.8.0.jar
676K	distribution/package/catalogs/lakehouse-paimon/libs/apacheds-kerberos-codec-2.0.0-M15.jar
764K	distribution/package/catalogs/lakehouse-paimon/libs/httpclient-4.5.13.jar
764K	distribution/package/catalogs/lakehouse-paimon/libs/jackson-mapper-asl-1.9.13.jar
768K	distribution/package/catalogs/lakehouse-paimon/libs/gravitino-core-0.7.0-incubating-SNAPSHOT.jar
892K	distribution/package/catalogs/lakehouse-paimon/libs/caffeine-2.9.3.jar
964K	distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-caffeine-2-2.9.3-0.8.0.jar
996K	distribution/package/catalogs/lakehouse-paimon/libs/commons-compress-1.21.jar
1.3M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-core-0.8.0.jar
1.4M	distribution/package/catalogs/lakehouse-paimon/libs/htrace-core4-4.1.0-incubating.jar
1.5M	distribution/package/catalogs/lakehouse-paimon/libs/jackson-databind-2.14.2.jar
1.6M	distribution/package/catalogs/lakehouse-paimon/libs/hadoop-mapreduce-client-core-2.10.2.jar
1.8M	distribution/package/catalogs/lakehouse-paimon/libs/log4j-core-2.22.0.jar
1.9M	distribution/package/catalogs/lakehouse-paimon/libs/snappy-java-1.1.8.3.jar
2.1M	distribution/package/catalogs/lakehouse-paimon/libs/commons-math3-3.6.1.jar
2.8M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-jackson-2-2.14.2-0.8.0.jar
2.9M	distribution/package/catalogs/lakehouse-paimon/libs/guava-32.1.3-jre.jar
2.9M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-guava-30-30.1.1-jre-0.8.0.jar
3.8M	distribution/package/catalogs/lakehouse-paimon/libs/hadoop-common-2.10.2.jar
3.8M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-common-0.8.0.jar
5.0M	distribution/package/catalogs/lakehouse-paimon/libs/hadoop-hdfs-2.10.2.jar
5.0M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-codegen-loader-0.8.0.jar
 20M	distribution/package/catalogs/lakehouse-paimon/libs/paimon-format-0.8.0.jar

@FANNG1 FANNG1 changed the title [#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 98MB [#4545] improvement(paimon-catalog): reduce catalog-lakehouse-paimon libs size from 222MB to 75MB Sep 23, 2024
@yuqi1129
Copy link
Contributor

2.9M distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-guava-30-30.1.1-jre-0.8.0.jar

Can we remove these?
2.9M distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-guava-30-30.1.1-jre-0.8.0.jar
28K distribution/package/catalogs/lakehouse-paimon/libs/log4j-slf4j2-impl-2.22.0.jar
1.8M distribution/package/catalogs/lakehouse-paimon/libs/log4j-core-2.22.0.jar

@jerryshao
Copy link
Contributor

Why the paimon format jar is so big, is it correct?

@jerryshao
Copy link
Contributor

Also, why do we need hdfs jar? I assume the hdfs client jar should be enough.

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 23, 2024

2.9M distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-guava-30-30.1.1-jre-0.8.0.jar

it's used by Paimon internally, we'd better to keep it.

Can we remove these? 2.9M distribution/package/catalogs/lakehouse-paimon/libs/paimon-shade-guava-30-30.1.1-jre-0.8.0.jar 28K distribution/package/catalogs/lakehouse-paimon/libs/log4j-slf4j2-impl-2.22.0.jar 1.8M distribution/package/catalogs/lakehouse-paimon/libs/log4j-core-2.22.0.jar

Yes, this could be removed

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 23, 2024

Why the paimon format jar is so big, is it correct?

Paimon format jar packages the dependences in the jar, @caican00 do you know any other alternative jars?

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 23, 2024

Also, why do we need hdfs jar? I assume the hdfs client jar should be enough.

Kerberos relies on the default HDFS configuration, removing hdfs jar may cause Kerberos operation failed, cc @yuqi1129

@jerryshao
Copy link
Contributor

Why the paimon format jar is so big, is it correct?

Paimon format jar packages the dependences in the jar, @caican00 do you know any other alternative jars?

@caican00 any thoughts on this?

@yuqi1129
Copy link
Contributor

Also, why do we need hdfs jar? I assume the hdfs client jar should be enough.

Kerberos relies on the default HDFS configuration, removing hdfs jar may cause Kerberos operation failed, cc @yuqi1129

Yeah, it will make Paimon Kerberos-related tests fail. Let's leave it as is now.

@yuqi1129
Copy link
Contributor

Also, why do we need hdfs jar? I assume the hdfs client jar should be enough.

Kerberos relies on the default HDFS configuration, removing hdfs jar may cause Kerberos operation failed, cc @yuqi1129

Yeah, it will make Paimon Kerberos-related tests fail. Let's leave it as is now.

I have tested it and we can make hdfs as testImplementation.

@caican00
Copy link
Collaborator

Why the paimon format jar is so big, is it correct?

Paimon format jar packages the dependences in the jar, @caican00 do you know any other alternative jars?

@caican00 any thoughts on this?

let me check it.

@caican00
Copy link
Collaborator

caican00 commented Sep 27, 2024

Why the paimon format jar is so big, is it correct?

Paimon format jar packages the dependences in the jar, @caican00 do you know any other alternative jars?

@caican00 any thoughts on this?

let me check it.

The paimon catalog explicitly calls the code in the paimon-format module, so it cannot be replaced with other jars.

I exclude more dependencies and ran through the integration tests.

ITS:
image

image

And deployed it on physical machine for testing.
physical enviroment:

image

I have no permission to push code to the repo, please help update the pr.

  implementation(libs.bundles.paimon) {
    exclude("com.sun.jersey")
    exclude("javax.servlet")
    exclude("org.apache.curator")
    exclude("org.apache.hive")
    exclude("org.apache.hbase")
    exclude("org.apache.zookeeper")
    exclude("org.eclipse.jetty.aggregate:jetty-all")
    exclude("org.mortbay.jetty")
    exclude("org.mortbay.jetty:jetty")
    exclude("org.mortbay.jetty:jetty-util")
    exclude("org.mortbay.jetty:jetty-sslengine")
    exclude("it.unimi.dsi")
    exclude("com.ververica")
    exclude("org.apache.hadoop")
    exclude("org.apache.commons")
    exclude("org.xerial.snappy")
    exclude("com.github.luben")
    exclude("com.google.protobuf")
    exclude("joda-time")
    exclude("org.apache.parquet:parquet-jackson")
    exclude("org.apache.parquet:parquet-format-structures")
    exclude("org.apache.parquet:parquet-encoding")
    exclude("org.apache.parquet:parquet-common")
    exclude("org.apache.parquet:parquet-hadoop")
    exclude("org.apache.paimon:paimon-codegen-loader")
    exclude("org.apache.paimon:paimon-shade-caffeine-2")
    exclude("org.apache.paimon:paimon-shade-guava-30")
  }

cc @jerryshao @FANNG1

@jerryshao jerryshao merged commit 3c20d95 into apache:main Sep 27, 2024
26 checks passed
@LiuQhahah LiuQhahah deleted the #4545-Shrink-Paimon-catalog-binary-package-size branch September 27, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Shrink Paimon catalog binary package size.
5 participants