-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4938]feat(lakehouse-paimon): Support S3 filesystem for Paimon catalog. #4939
Conversation
@@ -61,6 +62,12 @@ public class PaimonCatalogPropertiesMetadata extends BaseCatalogPropertiesMetada | |||
AuthenticationConfig.AUTH_TYPE_KEY, | |||
AuthenticationConfig.AUTH_TYPE_KEY); | |||
|
|||
private static final Map<String, String> S3_CONFIGURATION = | |||
ImmutableMap.of( | |||
PaimonS3FileSystemConfig.S3_ACCESS_KEY, PaimonS3FileSystemConfig.S3_ACCESS_KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use the properties defined in S3Properties
to unify the storage configuration for Gravitino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #4939 (comment)
...n/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/filesystem/FileSystemType.java
Outdated
Show resolved
Hide resolved
...t/java/org/apache/gravitino/catalog/lakehouse/paimon/integration/test/CatalogPaimonS3IT.java
Outdated
Show resolved
Hide resolved
@Override | ||
protected void startNecessaryContainers() { | ||
localStackContainer = | ||
new LocalStackContainer(DockerImageName.parse("localstack/localstack")).withServices(S3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to use localstack
for test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, do you have a suggestion for the use of the s3 simulator?
LOCAL_FILE, | ||
HDFS, | ||
S3, | ||
OSS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not plan to support oss in this pr, it have better to remove it first to avoid confusion for other users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FANNG1 Do you have any other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable to remove OSS, since it's not supported.
...e-paimon/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/utils/CatalogUtils.java
Outdated
Show resolved
Hide resolved
import org.apache.gravitino.config.ConfigEntry; | ||
import org.apache.gravitino.connector.PropertyEntry; | ||
|
||
public class PaimonS3FileSystemConfig extends Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is possible to merge the same configurations of iceberg to provide a common configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys for S3 in Iceberg are not the same as those here, please see #4939 (comment)
...t/java/org/apache/gravitino/catalog/lakehouse/paimon/integration/test/CatalogPaimonS3IT.java
Outdated
Show resolved
Hide resolved
public enum FileSystemType { | ||
LOCAL_FILE, | ||
HDFS, | ||
S3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the rename in S3
an atomic operation?
FilesystemCatalog depends on the atomic rename to avoid commit confliction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/hadoop/blob/ea6e0f7cd58d0129897dfc7870aee188be80a904/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2365-L2385
i am not sure that if it will fail when the destination path exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the rename in
S3
an atomic operation? FilesystemCatalog depends on the atomic rename to avoid commit confliction.
I don't have much knowledge about it. let me verify this.
@FANNG1 |
} | ||
|
||
public static FileSystemType fromStoragePath(String storagePath) { | ||
if (storagePath.startsWith("s3://")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3a?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check it for Paimon, the value is s3://
in the example shown in https://paimon.apache.org/docs/0.8/filesystems/s3/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the value s3a
for Iceberg S3 storage?
} | ||
|
||
public static FileSystemType fromStoragePath(String storagePath) { | ||
if (storagePath.startsWith("s3://")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shoud we consider upper cases like S3://
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDFS only support lower case 'hdfs', if we use HDFS://xxx
Caused by: MetaException(message:Got exception: java.io.IOException No FileSystem for scheme: HDFS)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26660)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:26628)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result.read(ThriftHiveMetastore.java:26562)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
...e-paimon/src/main/java/org/apache/gravitino/catalog/lakehouse/paimon/utils/CatalogUtils.java
Outdated
Show resolved
Hide resolved
ImmutableMap.of( | ||
S3Properties.GRAVITINO_S3_ACCESS_KEY_ID, S3_ACCESS_KEY, | ||
S3Properties.GRAVITINO_S3_SECRET_ACCESS_KEY, S3_SECRET_KEY, | ||
S3Properties.GRAVITINO_S3_ENDPOINT, S3_ENDPOINT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider s3 region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the doc provided by Paimon, region
is excessive.
...java/org/apache/gravitino/catalog/lakehouse/paimon/integration/test/CatalogPaimonBaseIT.java
Outdated
Show resolved
Hide resolved
docs/lakehouse-paimon-catalog.md
Outdated
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the Kerberos authentication. | (none) | required if the value of `authentication.type` is Kerberos. | 0.6.0 | | ||
| `authentication.kerberos.check-interval-sec` | The check interval of Kerberos credential for Paimon catalog. | 60 | No | 0.6.0 | | ||
| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60 | No | 0.6.0 | | ||
| `s3.endpoint` | The endpoint of the AWS s3. | (none) | required if the value of `warehouse` is a S3 path | 0.7.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the s3 properties name correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
*/ | ||
package org.apache.gravitino.catalog.lakehouse.paimon.filesystem; | ||
|
||
public enum FileSystemType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this class if not used
What changes were proposed in this pull request?
Add support for Paimon S3 filesystem.
Note: related documents will be added in another PR.
Why are the changes needed?
for better user experience.
Fix: #4938
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Test locally and IT