[Lake/Paimon] Create datalake enabled table should also create in lake #640

luoyuxia · 2025-03-20T07:16:54Z

Purpose

Linked issue: close #430

Brief change log

Introduce LakeStoragePluginSetUp that load the LakeStoragePlugin by datalake format
Introduce LakeCatalog to create table in lake
When create table with lake enabeld, create the table in lake via LakeCatalog

Tests

LakeEnabledTableCreateITCase

API and Format

Documentation

luoyuxia · 2025-03-21T07:53:32Z

fluss-dist/src/main/assemblies/plugins.xml

+            <useTransitiveDependencies>true</useTransitiveDependencies>
+            <useTransitiveFiltering>true</useTransitiveFiltering>
+            <includes>
+                <include>org.apache.flink:flink-shaded-hadoop-2-uber</include>


Paimon requires hadoop bundled, soe we include it in paimon plugin dir
https://paimon.apache.org/docs/master/flink/quick-start/

wuchong · 2025-03-22T09:41:37Z

Is it ready to review? @luoyuxia

luoyuxia · 2025-03-24T02:52:27Z

.../fluss-lake-format-paimon/src/main/java/com/alibaba/fluss/lake/paimon/PaimonLakeCatalog.java

+        }
+
+        // set pk
+        if (tableDescriptor.getSchema().getPrimaryKey().isPresent()) {


At first, I introduce additional offset and timestamp coumns, that's to enabled Fluss to subscribe the data in lake from a given offset and timestamp via Fluss client. But now, I feel like we can remove these two additional offset and timestamp columns at least for now.

Now, we mainly focus on Flink read the historical data in paimon and real-time data in Fluss. The offset and timestamp columns is not used. Introduce these columns may bring unnecassary complexity in early stage

subscribe via offset and timestamp columns only works for log table and only works for paimon with bucket-num specified. But in paimon, it's recommend not to set bucket-num. So, offset and timestamp columns become useless in most cases.

Still, we keep the possibility to support to subscribe the data in lake from a given offset and timestamp in the future. We can then introduce a option to enabled this feature for lake table.

After discuss, still keep the ability to subscribe via offset/timestamp, so, let's introduce another column bucket to help us to subscribe via bucket + offset.

luoyuxia · 2025-03-24T03:52:55Z

Is it ready to review? @luoyuxia

Yes, now, it's ready to review.

luoyuxia · 2025-03-24T03:53:44Z

@wuchong @leonardBang Could you please help review?

[Lake/Paimon] Create datalake enabled table should also create in lake

21bcf36

luoyuxia force-pushed the create-lake-table branch 4 times, most recently from ad3e4a7 to b768922 Compare March 21, 2025 03:12

luoyuxia commented Mar 21, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch from b768922 to 74248a9 Compare March 21, 2025 08:09

luoyuxia commented Mar 24, 2025

View reviewed changes

luoyuxia force-pushed the create-lake-table branch from 74248a9 to 569d5e9 Compare March 24, 2025 03:41

luoyuxia marked this pull request as ready for review March 24, 2025 03:51

luoyuxia force-pushed the create-lake-table branch from 569d5e9 to 8e06422 Compare March 24, 2025 03:52

luoyuxia requested review from wuchong and leonardBang March 24, 2025 03:52

[Lake/Paimon] Create datalake enabled table should also create in lake

85e28a2

luoyuxia force-pushed the create-lake-table branch from 8e06422 to 85e28a2 Compare March 25, 2025 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lake/Paimon] Create datalake enabled table should also create in lake #640

[Lake/Paimon] Create datalake enabled table should also create in lake #640

luoyuxia commented Mar 20, 2025 •

edited

Loading

luoyuxia Mar 21, 2025

wuchong commented Mar 22, 2025

luoyuxia Mar 24, 2025

luoyuxia Mar 25, 2025

luoyuxia commented Mar 24, 2025

luoyuxia commented Mar 24, 2025

[Lake/Paimon] Create datalake enabled table should also create in lake #640

Are you sure you want to change the base?

[Lake/Paimon] Create datalake enabled table should also create in lake #640

Conversation

luoyuxia commented Mar 20, 2025 • edited Loading

Purpose

Brief change log

Tests

API and Format

Documentation

luoyuxia Mar 21, 2025

Choose a reason for hiding this comment

wuchong commented Mar 22, 2025

luoyuxia Mar 24, 2025

Choose a reason for hiding this comment

luoyuxia Mar 25, 2025

Choose a reason for hiding this comment

luoyuxia commented Mar 24, 2025

luoyuxia commented Mar 24, 2025

luoyuxia commented Mar 20, 2025 •

edited

Loading