Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4954] docs(hudi-catalog): Add docs for Hudi catalog #4976

Merged
merged 1 commit into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions docs/lakehouse-hudi-catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: "Hudi catalog"
slug: /lakehouse-hudi-catalog
keywords:
- lakehouse
- hudi
- metadata
license: "This software is licensed under the Apache License version 2."
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## Introduction

Apache Gravitino provides the ability to manage Apache Hudi metadata.

### Requirements and limitations

:::info
Tested and verified with Apache Hudi `0.15.0`.
:::

## Catalog

### Catalog capabilities

- Works as a catalog proxy, supporting `HMS` as catalog backend.
- Only support read operations (list and load) for Hudi schemas and tables.
- Doesn't support timeline management operations now.

### Catalog properties

| Property name | Description | Default value | Required | Since Version |
|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------|------------------|
| `catalog-backend` | Catalog backend of Gravitino Hudi catalog. Only supports `hms` now. | (none) | Yes | 0.7.0-incubating |
| `uri` | The URI associated with the backend. Such as `thrift://127.0.0.1:9083` for HMS backend. | (none) | Yes | 0.7.0-incubating |
| `client.pool-size` | For HMS backend. The maximum number of Hive metastore clients in the pool for Gravitino. | 1 | No | 0.7.0-incubating |
| `client.pool-cache.eviction-interval-ms` | For HMS backend. The cache pool eviction interval. | 300000 | No | 0.7.0-incubating |
| `gravitino.bypass.` | Property name with this prefix passed down to the underlying backend client for use. Such as `gravitino.bypass.hive.metastore.failure.retries = 3` indicate 3 times of retries upon failure of Thrift metastore calls for HMS backend. | (none) | No | 0.7.0-incubating |

### Catalog operations

Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#catalog-operations) for more details.

## Schema

### Schema capabilities

- Only support read operations: listSchema, loadSchema, and schemaExists.

### Schema properties

- The `Location` is an optional property that shows the storage path to the Hudi database

### Schema operations

Only support read operations: listSchema, loadSchema, and schemaExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#schema-operations) for more details.

## Table

### Table capabilities

- Only support read operations: listTable, loadTable, and tableExists.

### Table partitions

- Support loading Hudi partitioned tables (Hudi only supports identity partitioning).

### Table sort orders

- Doesn't support table sort orders.

### Table distributions

- Doesn't support table distributions.

### Table indexes

- Doesn't support table indexes.

### Table properties

- For HMS backend, it will bring out all the table parameters from the HMS.

### Table column types

The following table shows the mapping between Gravitino and [Apache Hudi column types](https://hudi.apache.org/docs/sql_ddl#supported-types):

| Gravitino Type | Apache Hudi Type |
|----------------|------------------|
| `boolean` | `boolean` |
| `integer` | `int` |
| `long` | `long` |
| `date` | `date` |
| `timestamp` | `timestamp` |
| `float` | `float` |
| `double` | `double` |
| `string` | `string` |
| `decimal` | `decimal` |
| `binary` | `bytes` |
| `array` | `array` |
| `map` | `map` |
| `struct` | `struct` |

### Table operations

Only support read operations: listTable, loadTable, and tableExists.
Please refer to [Manage Relational Metadata Using Gravitino](./manage-relational-metadata-using-gravitino.md#table-operations) for more details.
6 changes: 6 additions & 0 deletions docs/manage-relational-metadata-using-gravitino.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ For more details, please refer to the related doc.
- [**Apache Doris**](./jdbc-doris-catalog.md)
- [**Apache Iceberg**](./lakehouse-iceberg-catalog.md)
- [**Apache Paimon**](./lakehouse-paimon-catalog.md)
- [**Apache Hudi**](./lakehouse-hudi-catalog.md)

Assuming:

Expand Down Expand Up @@ -93,6 +94,7 @@ Currently, Gravitino supports the following catalog providers:
| `hive` | [Hive catalog property](./apache-hive-catalog.md#catalog-properties) |
| `lakehouse-iceberg` | [Iceberg catalog property](./lakehouse-iceberg-catalog.md#catalog-properties) |
| `lakehouse-paimon` | [Paimon catalog property](./lakehouse-paimon-catalog.md#catalog-properties) |
| `lakehouse-hudi` | [Hudi catalog property](./lakehouse-hudi-catalog.md#catalog-properties) |
| `jdbc-mysql` | [MySQL catalog property](./jdbc-mysql-catalog.md#catalog-properties) |
| `jdbc-postgresql` | [PostgreSQL catalog property](./jdbc-postgresql-catalog.md#catalog-properties) |
| `jdbc-doris` | [Doris catalog property](./jdbc-doris-catalog.md#catalog-properties) |
Expand Down Expand Up @@ -326,6 +328,7 @@ Currently, Gravitino supports the following schema property:
| `hive` | [Hive schema property](./apache-hive-catalog.md#schema-properties) |
| `lakehouse-iceberg` | [Iceberg scheme property](./lakehouse-iceberg-catalog.md#schema-properties) |
| `lakehouse-paimon` | [Paimon scheme property](./lakehouse-paimon-catalog.md#schema-properties) |
| `lakehouse-hudi` | [Hudi scheme property](./lakehouse-hudi-catalog.md#schema-properties) |
| `jdbc-mysql` | [MySQL schema property](./jdbc-mysql-catalog.md#schema-properties) |
| `jdbc-postgresql` | [PostgreSQL schema property](./jdbc-postgresql-catalog.md#schema-properties) |
| `jdbc-doris` | [Doris schema property](./jdbc-doris-catalog.md#schema-properties) |
Expand Down Expand Up @@ -807,6 +810,7 @@ The following is a table of the column default value that Gravitino supports for
| `hive` | ✘ |
| `lakehouse-iceberg` | ✘ |
| `lakehouse-paimon` | ✘ |
| `lakehouse-hudi` | ✘ |
| `jdbc-mysql` | ✔ |
| `jdbc-postgresql` | ✔ |

Expand All @@ -820,6 +824,7 @@ The following table shows the column auto-increment that Gravitino supports for
| `hive` | ✘ |
| `lakehouse-iceberg` | ✘ |
| `lakehouse-paimon` | ✘ |
| `lakehouse-hudi` | ✘ |
| `jdbc-mysql` | ✔([limitations](./jdbc-mysql-catalog.md#table-column-auto-increment)) |
| `jdbc-postgresql` | ✔ |

Expand All @@ -832,6 +837,7 @@ The following is the table property that Gravitino supports:
| `hive` | [Hive table property](./apache-hive-catalog.md#table-properties) | [Hive type mapping](./apache-hive-catalog.md#table-column-types) |
| `lakehouse-iceberg` | [Iceberg table property](./lakehouse-iceberg-catalog.md#table-properties) | [Iceberg type mapping](./lakehouse-iceberg-catalog.md#table-column-types) |
| `lakehouse-paimon` | [Paimon table property](./lakehouse-paimon-catalog.md#table-properties) | [Paimon type mapping](./lakehouse-paimon-catalog.md#table-column-types) |
| `lakehouse-hudi` | [Hudi table property](./lakehouse-hudi-catalog.md#table-properties) | [Hudi type mapping](./lakehouse-hudi-catalog.md#table-column-types) |
| `jdbc-mysql` | [MySQL table property](./jdbc-mysql-catalog.md#table-properties) | [MySQL type mapping](./jdbc-mysql-catalog.md#table-column-types) |
| `jdbc-postgresql` | [PostgreSQL table property](./jdbc-postgresql-catalog.md#table-properties) | [PostgreSQL type mapping](./jdbc-postgresql-catalog.md#table-column-types) |
| `doris` | [Doris table property](./jdbc-doris-catalog.md#table-properties) | [Doris type mapping](./jdbc-doris-catalog.md#table-column-types) |
Expand Down
Loading