procedure add_files parallelism > 1 -> NotSerializableException #11147

zzeekk · 2024-09-16T16:04:53Z

Apache Iceberg version

1.6.1 (latest release)

Query engine

Spark

Please describe the bug 🐞

Problem:
Executing "system.add_files(... parallelism => 2)" results in a NotSerializableException for an instance ExecutorService:
Task not serializable: java.io.NotSerializableException: java.util.concurrent.Executors$DelegatedExecutorService
in MapPartitionsRDD[16] at collectAsList at SparkTableUtil.java:792, org.apache.spark.ShuffleDependency@12d6880f

Expectations:
add_files runs without exception, also if parallelism > 1.

Suggestions:
Dont pass ExecutorService instance from the Spark driver as argument to listPartition in SparkTableUtils.java:759, but create ExecutorService in listPartitions on the Spark executor.

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

manuzhang · 2024-09-18T02:22:13Z

@zzeekk Thanks for reporting this bug. I will look into it.

zzeekk · 2024-09-21T07:21:23Z

Thanks a lot @manuzhang, looks good.

zzeekk added the bug Something isn't working label Sep 16, 2024

manuzhang linked a pull request Sep 18, 2024 that will close this issue

Spark 3.5: Fix NotSerializableException when migrating Spark tables #11157

Open

zzeekk mentioned this issue Sep 21, 2024

Release 2.7.1 smart-data-lake/smart-data-lake#895

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

procedure add_files parallelism > 1 -> NotSerializableException #11147

procedure add_files parallelism > 1 -> NotSerializableException #11147

zzeekk commented Sep 16, 2024

manuzhang commented Sep 18, 2024

zzeekk commented Sep 21, 2024

procedure add_files parallelism > 1 -> NotSerializableException #11147

procedure add_files parallelism > 1 -> NotSerializableException #11147

Comments

zzeekk commented Sep 16, 2024

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

manuzhang commented Sep 18, 2024

zzeekk commented Sep 21, 2024