Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake/Paimon] Create datalake enabled table should also create in lake #640

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

luoyuxia
Copy link
Collaborator

@luoyuxia luoyuxia commented Mar 20, 2025

Purpose

Linked issue: close #430

Brief change log

  1. Introduce LakeStoragePluginSetUp that load the LakeStoragePlugin by datalake format
  2. Introduce LakeCatalog to create table in lake
  3. When create table with lake enabeld, create the table in lake via LakeCatalog

Tests

LakeEnabledTableCreateITCase

API and Format

Documentation

@luoyuxia luoyuxia force-pushed the create-lake-table branch 4 times, most recently from ad3e4a7 to b768922 Compare March 21, 2025 03:12
<useTransitiveDependencies>true</useTransitiveDependencies>
<useTransitiveFiltering>true</useTransitiveFiltering>
<includes>
<include>org.apache.flink:flink-shaded-hadoop-2-uber</include>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paimon requires hadoop bundled, soe we include it in paimon plugin dir
https://paimon.apache.org/docs/master/flink/quick-start/

@luoyuxia luoyuxia force-pushed the create-lake-table branch from b768922 to 74248a9 Compare March 21, 2025 08:09
@wuchong
Copy link
Member

wuchong commented Mar 22, 2025

Is it ready to review? @luoyuxia

}

// set pk
if (tableDescriptor.getSchema().getPrimaryKey().isPresent()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I introduce additional offset and timestamp coumns, that's to enabled Fluss to subscribe the data in lake from a given offset and timestamp via Fluss client. But now, I feel like we can remove these two additional offset and timestamp columns at least for now.

  1. Now, we mainly focus on Flink read the historical data in paimon and real-time data in Fluss. The offset and timestamp columns is not used. Introduce these columns may bring unnecassary complexity in early stage

  2. subscribe via offset and timestamp columns only works for log table and only works for paimon with bucket-num specified. But in paimon, it's recommend not to set bucket-num. So, offset and timestamp columns become useless in most cases.

Still, we keep the possibility to support to subscribe the data in lake from a given offset and timestamp in the future. We can then introduce a option to enabled this feature for lake table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discuss, still keep the ability to subscribe via offset/timestamp, so, let's introduce another column bucket to help us to subscribe via bucket + offset.

@luoyuxia luoyuxia force-pushed the create-lake-table branch from 74248a9 to 569d5e9 Compare March 24, 2025 03:41
@luoyuxia luoyuxia marked this pull request as ready for review March 24, 2025 03:51
@luoyuxia luoyuxia force-pushed the create-lake-table branch from 569d5e9 to 8e06422 Compare March 24, 2025 03:52
@luoyuxia
Copy link
Collaborator Author

Is it ready to review? @luoyuxia

Yes, now, it's ready to review.

@luoyuxia luoyuxia requested review from wuchong and leonardBang March 24, 2025 03:52
@luoyuxia
Copy link
Collaborator Author

@wuchong @leonardBang Could you please help review?

@luoyuxia luoyuxia force-pushed the create-lake-table branch from 8e06422 to 85e28a2 Compare March 25, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synchronously create Lake tables when Flusss lakehouse enabled during Fluss table creation
2 participants