-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lake/Paimon] Create datalake enabled table should also create in lake #640
base: main
Are you sure you want to change the base?
Conversation
ad3e4a7
to
b768922
Compare
<useTransitiveDependencies>true</useTransitiveDependencies> | ||
<useTransitiveFiltering>true</useTransitiveFiltering> | ||
<includes> | ||
<include>org.apache.flink:flink-shaded-hadoop-2-uber</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paimon requires hadoop bundled, soe we include it in paimon plugin dir
https://paimon.apache.org/docs/master/flink/quick-start/
b768922
to
74248a9
Compare
Is it ready to review? @luoyuxia |
} | ||
|
||
// set pk | ||
if (tableDescriptor.getSchema().getPrimaryKey().isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I introduce additional offset
and timestamp
coumns, that's to enabled Fluss to subscribe the data in lake from a given offset
and timestamp
via Fluss client. But now, I feel like we can remove these two additional offset
and timestamp
columns at least for now.
-
Now, we mainly focus on Flink read the historical data in paimon and real-time data in Fluss. The
offset
andtimestamp
columns is not used. Introduce these columns may bring unnecassary complexity in early stage -
subscribe via
offset
andtimestamp
columns only works for log table and only works for paimon with bucket-num specified. But in paimon, it's recommend not to set bucket-num. So,offset
andtimestamp
columns become useless in most cases.
Still, we keep the possibility to support to subscribe the data in lake from a given offset
and timestamp
in the future. We can then introduce a option to enabled this feature for lake table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discuss, still keep the ability to subscribe via offset/timestamp, so, let's introduce another column bucket
to help us to subscribe via bucket + offset.
74248a9
to
569d5e9
Compare
569d5e9
to
8e06422
Compare
Yes, now, it's ready to review. |
@wuchong @leonardBang Could you please help review? |
8e06422
to
85e28a2
Compare
Purpose
Linked issue: close #430
Brief change log
LakeStoragePluginSetUp
that load theLakeStoragePlugin
by datalake formatLakeCatalog
to create table in lakeLakeCatalog
Tests
LakeEnabledTableCreateITCase
API and Format
Documentation