Skip to content

Commit 7517526

Browse files
committed
Add Lakeformation doc
1 parent a9d180c commit 7517526

File tree

5 files changed

+156
-8
lines changed

5 files changed

+156
-8
lines changed

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,7 @@ yarn-error.log*
2828
/static/feeds/atom.xml
2929
/static/feeds/rss.json
3030
/static/feeds/rss.xml
31+
32+
# IDE
33+
.idea
34+
.vscode

dbt-athena-versions.js

+9-3
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,15 @@ exports.versionedPages = [
2323
firstVersion: "1.4.1",
2424
lastVersion: "1.4",
2525
},
26-
{
27-
page: "docs/configuration/contract",
28-
firstVersion: "1.5.0",
26+
{
27+
page: "docs/configuration/contract-constraints",
28+
firstVersion: "1.5.1",
29+
lastVersion: "1.5",
30+
},
31+
{
32+
page: "docs/configuration/lakeformation",
33+
firstVersion: "1.5.1",
34+
lastVersion: "1.5",
2935
},
3036
];
3137

+136
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
title: "Lakeformation"
3+
id: lakeformation
4+
---
5+
6+
## Tags
7+
The adapter implements AWS Lakeformation tags management in the following way:
8+
- you can enable or disable lf-tags management via [config](./table-configuration) (disabled by default).
9+
Here are config examples:
10+
11+
`model_config.sql`:
12+
```sql
13+
{{
14+
config(
15+
materialized='incremental',
16+
incremental_strategy='append',
17+
on_schema_change='append_new_columns',
18+
table_type='iceberg',
19+
schema='test_schema',
20+
lf_tags_config={
21+
'enabled': true,
22+
'tags': {
23+
'tag1': 'value1',
24+
'tag2': 'value2'
25+
},
26+
'tags_columns': {
27+
'tag1': {
28+
'value1': ['column1', 'column2'],
29+
'value2': ['column3', 'column4']
30+
}
31+
}
32+
}
33+
)
34+
}}
35+
```
36+
37+
`dbt_project.yml`:
38+
```yaml
39+
+lf_tags_config:
40+
enabled: true
41+
tags:
42+
tag1: value1
43+
tag2: value2
44+
tags_columns:
45+
tag1:
46+
value1: [ column1, column2 ]
47+
```
48+
49+
- once you enable the feature, lf-tags will be updated on every dbt run
50+
- first, all lf-tags for **columns** are removed to avoid inheritance issues
51+
- then all redundant lf-tags are removed from **table** and actual tags from config are applied
52+
- finally, lf-tags for **columns** are applied
53+
54+
:::info
55+
56+
It's important to understand the following points:
57+
- dbt does not manage lf-tags for database
58+
- dbt does not manage lakeformation permissions
59+
60+
That's why you should handle this by yourself manually or using some automation tools like terraform, AWS CDK etc.
61+
You may find the following links useful to manage that:
62+
- [terraform aws_lakeformation_permissions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_permissions)
63+
- [terraform aws_lakeformation_resource_lf_tags](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lakeformation_resource_lf_tags)
64+
65+
:::
66+
67+
## Data Cell Filters
68+
The adapter implements AWS Lakeformation Data Cell Filters management in the following way.
69+
`model_config.sql`:
70+
```sql
71+
{{
72+
config(
73+
materialized='incremental',
74+
incremental_strategy='append',
75+
on_schema_change='append_new_columns',
76+
table_type='iceberg',
77+
schema='test_schema',
78+
lf_grants={
79+
'data_cell_filters': {
80+
'enabled': True | False,
81+
'filters': {
82+
'filter_name': {
83+
'row_filter': '<filter_condition>',
84+
'principals': ['principal_arn1', 'principal_arn2']
85+
}
86+
}
87+
}
88+
}
89+
}}
90+
```
91+
92+
or more advanced example for `dbt_project.yml`
93+
```yaml
94+
models:
95+
directory:
96+
+schema: your_schema
97+
+materialized: incremental
98+
+on_schema_change: sync_all_columns
99+
model1:
100+
+lf_grants: &default_rls
101+
data_cell_filters:
102+
enabled: true
103+
filters:
104+
name1:
105+
row_filter: "field1 = 'value1'"
106+
principals:
107+
- "role1_arn"
108+
- "role2_arn"
109+
name2:
110+
row_filter: "field1 = 'value2'"
111+
principals:
112+
- "role3_arn"
113+
- "role4_arn"
114+
model2:
115+
+lf_grants: *default_rls # reuse previously defined config
116+
```
117+
118+
- Data cell filters management can't be automated outside dbt because the filter can't be attached to the table
119+
which doesn't exist.
120+
- Once you `enable` this config, dbt will set all filters and their permissions during every dbt run.
121+
- Such approach keeps the actual state of row level security configuration actual after every dbt run and
122+
applies changes if they occur: drop, create, update filters and their permissions.
123+
124+
:::caution
125+
126+
It's important to understand that LF permissions work like `union`.
127+
Let's imagine this scenario:
128+
- Table X has tag `domain=foo`
129+
- Role A has `select` permission for tables with `domain=foo`
130+
- We add a data cell filter for a column in table X and then grant permissions to role A
131+
132+
In this case, tag permissions are the ones considered, and cell-level permissions are totally ignored.
133+
This means that this data cell filters management feature implies that you should use permissions for specific tables
134+
which don't have already tag-level permissions in that specific database or table.
135+
136+
:::

docs/docs/configuration/table-configuration.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ id: table-configuration
66
## Model configuration
77

88
| Property | Description | Default |
9-
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- |
10-
| `materialized` | A table materialization like `table`, `incremental`, [`table_hive_ha`](docs/configuration/materializations/hive-ha) |
9+
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
10+
| `materialized` | A table materialization like `table`, `incremental`, [`table_hive_ha`](./materializations/hive-ha) | |
1111
| `s3_data_naming` | An optional naming policy for the data on S3. See [Table data location](#table-data-location). | `schema_table_unique` |
1212
| `external_location` | If set, the full S3 path in which the table will be saved. (Does not work with Iceberg table). | `none` |
1313
| `partitioned_by` | An array list of columns by which the table will be partitioned. ⚠️ [Limited to the creation of 100 partitions](https://docs.aws.amazon.com/athena/latest/ug/ctas-considerations-limitations.html#ctas-considerations-limitations-partition-and-bucket-limits). | `none` |
@@ -18,12 +18,12 @@ id: table-configuration
1818
| `write_compression` | The compression type to use for any storage format that allows compression to be specified. To see which options are available, see [CREATE TABLE AS](https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html). Example: `SNAPPY`. | `none` |
1919
| `field_delimiter` | Custom field delimiter. Used when the format is set to `TEXTFILE`. See [CREATE TABLE AS](https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html). Example: `','` | `none` |
2020
| `table_properties` | Additional table properties to add to the table. Valid for Iceberg only. Example: `{'optimize_rewrite_delete_file_threshold': '2'}` | `none` |
21-
| `lf_tags` | Lake Formation tags for metadata access control, to associate to the table. Example: `{"tag1":{"tag1": "value1", "tag2": "value2"}` | `none` |
22-
| `lf_tags_columns` | Lake Formation tags for metadata access control, to associate to columns. Example: `{"tag1": {"value1": ["column1": "column2"]}}` | `none` |
21+
| `lf_tags_config` | Lake Formation tags for metadata access control, to associate to the table. See detailed instructions [here](./lakeformation) | `none` |
22+
| `lf_grants` | Lake Formation tags for metadata access control, to associate to columns. See detailed instructions [here](./lakeformation) | `none` |
2323

2424
## Table data location
2525

26-
The S3 location in which table data is saved, is determined by:
26+
The S3 location in which table data is saved is determined by:
2727

2828
1. If `external_location` is defined, that value is used.
2929
2. If `s3_data_dir` is defined, the path is determined by this value and `s3_data_naming`.

sidebars.js

+2
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ const sidebarSettings = {
3838
},
3939
"docs/configuration/seeds",
4040
"docs/configuration/snapshots",
41+
"docs/configuration/contract-constraints",
42+
"docs/configuration/lakeformation",
4143
],
4244
},
4345

0 commit comments

Comments
 (0)