From feaf1ca769bf7ec71bf4a5df06e9a6e397167fb0 Mon Sep 17 00:00:00 2001 From: mchades Date: Fri, 20 Sep 2024 17:24:54 +0800 Subject: [PATCH] add user doc for Hudi catalog --- docs/lakehouse-hudi-catalog.md | 110 ++++++++++++++++++ ...age-relational-metadata-using-gravitino.md | 6 + 2 files changed, 116 insertions(+) create mode 100644 docs/lakehouse-hudi-catalog.md diff --git a/docs/lakehouse-hudi-catalog.md b/docs/lakehouse-hudi-catalog.md new file mode 100644 index 00000000000..be6d328bfb4 --- /dev/null +++ b/docs/lakehouse-hudi-catalog.md @@ -0,0 +1,110 @@ +--- +title: "Hudi catalog" +slug: /lakehouse-hudi-catalog +keywords: + - lakehouse + - hudi + - metadata +license: "This software is licensed under the Apache License version 2." +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Introduction + +Apache Gravitino provides the ability to manage Apache Hudi metadata. + +### Requirements and limitations + +:::info +Tested and verified with Apache Hudi `0.15.0`. +::: + +## Catalog + +### Catalog capabilities + +- Works as a catalog proxy, supporting `HMS` as catalog backend. +- Only support read operations (list and load) for Hudi schemas and tables. +- Doesn't support timeline management operations now. + +### Catalog properties + +| Property name | Description | Default value | Required | Since Version | +|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------| +| `catalog-backend` | Catalog backend of Gravitino Hudi catalog. Only supports `hms` now. | (none) | Yes | 0.7.0-incubating | +| `uri` | The URI associated with the backend. Such as `thrift://127.0.0.1:9083` for HMS backend. | (none) | Yes | 0.7.0-incubating | +| `client.pool-size` | For HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.7.0-incubating | +| `client.pool-cache.eviction-interval-ms` | For HMS backend. The cache pool eviction interval. | 300000 | No | 0.7.0-incubating | +| `gravitino.bypass.` | Property name with this prefix passed down to the underlying backend client for use. Such as `gravitino.bypass.hive.metastore.failure.retries = 3` indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend. | (none) | No | 0.7.0-incubating | + +### Catalog operations + +Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#catalog-operations) for more details. + +## Schema + +### Schema capabilities + +- Only support read operations: listSchema, loadSchema, and schemaExists. + +### Schema properties + +- The `Location` is an optional property that shows the storage path to the Hudi database + +### Schema operations + +Only support read operations: listSchema, loadSchema, and schemaExists. +Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#schema-operations) for more details. + +## Table + +### Table capabilities + +- Only support read operations: listTable, loadTable, and tableExists. + +### Table partitions + +- Support loading Hudi partitioned tables (Hudi only supports identity partitioning). + +### Table sort orders + +- Doesn't support table sort orders. + +### Table distributions + +- Doesn't support table distributions. + +### Table indexes + +- Doesn't support table indexes. + +### Table properties + +- For HMS backend, it will bring out all the table parameters from the HMS. + +### Table column types + +The following table shows the mapping between Gravitino and [Apache Hudi column types](https://hudi.apache.org/docs/sql_ddl#supported-types): + +| Gravitino Type | Apache Hudi Type | +|----------------|------------------| +| `boolean` | `boolean` | +| `integer` | `int` | +| `long` | `long` | +| `date` | `date` | +| `timestamp` | `timestamp` | +| `float` | `float` | +| `double` | `double` | +| `string` | `string` | +| `decimal` | `decimal` | +| `binary` | `bytes` | +| `array` | `array` | +| `map` | `map` | +| `struct` | `struct` | + +### Table operations + +Only support read operations: listTable, loadTable, and tableExists. +Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#table-operations) for more details. diff --git a/docs/manage-relational-metadata-using-gravitino.md b/docs/manage-relational-metadata-using-gravitino.md index fa2a11ac487..f810b4aa325 100644 --- a/docs/manage-relational-metadata-using-gravitino.md +++ b/docs/manage-relational-metadata-using-gravitino.md @@ -24,6 +24,7 @@ For more details, please refer to the related doc. - [**Apache Doris**](./jdbc-doris-catalog.md) - [**Apache Iceberg**](./lakehouse-iceberg-catalog.md) - [**Apache Paimon**](./lakehouse-paimon-catalog.md) +- [**Apache Hudi**](./lakehouse-hudi-catalog.md) Assuming: @@ -93,6 +94,7 @@ Currently, Gravitino supports the following catalog providers: | `hive` | [Hive catalog property](./apache-hive-catalog.md#catalog-properties) | | `lakehouse-iceberg` | [Iceberg catalog property](./lakehouse-iceberg-catalog.md#catalog-properties) | | `lakehouse-paimon` | [Paimon catalog property](./lakehouse-paimon-catalog.md#catalog-properties) | +| `lakehouse-hudi` | [Hudi catalog property](./lakehouse-hudi-catalog.md#catalog-properties) | | `jdbc-mysql` | [MySQL catalog property](./jdbc-mysql-catalog.md#catalog-properties) | | `jdbc-postgresql` | [PostgreSQL catalog property](./jdbc-postgresql-catalog.md#catalog-properties) | | `jdbc-doris` | [Doris catalog property](./jdbc-doris-catalog.md#catalog-properties) | @@ -326,6 +328,7 @@ Currently, Gravitino supports the following schema property: | `hive` | [Hive schema property](./apache-hive-catalog.md#schema-properties) | | `lakehouse-iceberg` | [Iceberg scheme property](./lakehouse-iceberg-catalog.md#schema-properties) | | `lakehouse-paimon` | [Paimon scheme property](./lakehouse-paimon-catalog.md#schema-properties) | +| `lakehouse-hudi` | [Hudi scheme property](./lakehouse-hudi-catalog.md#schema-properties) | | `jdbc-mysql` | [MySQL schema property](./jdbc-mysql-catalog.md#schema-properties) | | `jdbc-postgresql` | [PostgreSQL schema property](./jdbc-postgresql-catalog.md#schema-properties) | | `jdbc-doris` | [Doris schema property](./jdbc-doris-catalog.md#schema-properties) | @@ -807,6 +810,7 @@ The following is a table of the column default value that Gravitino supports for | `hive` | ✘ | | `lakehouse-iceberg` | ✘ | | `lakehouse-paimon` | ✘ | +| `lakehouse-hudi` | ✘ | | `jdbc-mysql` | ✔ | | `jdbc-postgresql` | ✔ | @@ -820,6 +824,7 @@ The following table shows the column auto-increment that Gravitino supports for | `hive` | ✘ | | `lakehouse-iceberg` | ✘ | | `lakehouse-paimon` | ✘ | +| `lakehouse-hudi` | ✘ | | `jdbc-mysql` | ✔([limitations](./jdbc-mysql-catalog.md#table-column-auto-increment)) | | `jdbc-postgresql` | ✔ | @@ -832,6 +837,7 @@ The following is the table property that Gravitino supports: | `hive` | [Hive table property](./apache-hive-catalog.md#table-properties) | [Hive type mapping](./apache-hive-catalog.md#table-column-types) | | `lakehouse-iceberg` | [Iceberg table property](./lakehouse-iceberg-catalog.md#table-properties) | [Iceberg type mapping](./lakehouse-iceberg-catalog.md#table-column-types) | | `lakehouse-paimon` | [Paimon table property](./lakehouse-paimon-catalog.md#table-properties) | [Paimon type mapping](./lakehouse-paimon-catalog.md#table-column-types) | +| `lakehouse-hudi` | [Hudi table property](./lakehouse-hudi-catalog.md#table-properties) | [Hudi type mapping](./lakehouse-hudi-catalog.md#table-column-types) | | `jdbc-mysql` | [MySQL table property](./jdbc-mysql-catalog.md#table-properties) | [MySQL type mapping](./jdbc-mysql-catalog.md#table-column-types) | | `jdbc-postgresql` | [PostgreSQL table property](./jdbc-postgresql-catalog.md#table-properties) | [PostgreSQL type mapping](./jdbc-postgresql-catalog.md#table-column-types) | | `doris` | [Doris table property](./jdbc-doris-catalog.md#table-properties) | [Doris type mapping](./jdbc-doris-catalog.md#table-column-types) |