Skip to content

Commit f6d2bed

Browse files
Identity Columns cntd.
1 parent 3bace4f commit f6d2bed

File tree

6 files changed

+138
-32
lines changed

6 files changed

+138
-32
lines changed

docs/ColumnWithDefaultExprUtils.md

+24-15
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
IDENTITY column is not supported
88
```
99

10-
## <span id="IDENTITY_MIN_WRITER_VERSION"> IDENTITY_MIN_WRITER_VERSION
10+
## IDENTITY_MIN_WRITER_VERSION { #IDENTITY_MIN_WRITER_VERSION }
1111

1212
`ColumnWithDefaultExprUtils` uses `6` as the [minimum version of a writer](Protocol.md#minWriterVersion) for writing to `IDENTITY` columns.
1313

@@ -16,7 +16,7 @@
1616
* `ColumnWithDefaultExprUtils` is used to [satisfyProtocol](#satisfyProtocol)
1717
* `Protocol` utility is used to [determine the required minimum protocol](Protocol.md#requiredMinimumProtocol)
1818

19-
## <span id="columnHasDefaultExpr"> columnHasDefaultExpr
19+
## columnHasDefaultExpr { #columnHasDefaultExpr }
2020

2121
```scala
2222
columnHasDefaultExpr(
@@ -30,7 +30,7 @@ columnHasDefaultExpr(
3030

3131
* `DeltaAnalysis` logical resolution rule is requested to `resolveQueryColumnsByName`
3232

33-
## <span id="hasIdentityColumn"> hasIdentityColumn
33+
## hasIdentityColumn { #hasIdentityColumn }
3434

3535
```scala
3636
hasIdentityColumn(
@@ -43,41 +43,50 @@ hasIdentityColumn(
4343

4444
* `Protocol` utility is used for the [required minimum protocol](Protocol.md#requiredMinimumProtocol)
4545

46-
## <span id="isIdentityColumn"> isIdentityColumn
46+
## isIdentityColumn { #isIdentityColumn }
4747

4848
```scala
4949
isIdentityColumn(
5050
field: StructField): Boolean
5151
```
5252

53-
`isIdentityColumn` uses the `Metadata` (of the given `StructField`) to check the existence of [delta.identity.start](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_START), [delta.identity.step](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_STEP) and [delta.identity.allowExplicitInsert](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_ALLOW_EXPLICIT_INSERT) metadata keys.
53+
`isIdentityColumn` is used to find out whether a `StructField` is an [identity column](identity-columns/index.md) or not.
5454

55-
!!! note "IDENTITY column"
56-
**IDENTITY column** is a column with [delta.identity.start](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_START), [delta.identity.step](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_STEP) and [delta.identity.allowExplicitInsert](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_ALLOW_EXPLICIT_INSERT) metadata.
55+
`isIdentityColumn` uses the `Metadata` (of the given `StructField`) to check the existence of the following metadata keys:
56+
57+
* [delta.identity.start](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_START)
58+
* [delta.identity.step](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_STEP)
59+
* [delta.identity.allowExplicitInsert](spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_ALLOW_EXPLICIT_INSERT)
60+
61+
---
5762

5863
`isIdentityColumn` is used when:
5964

60-
* `ColumnWithDefaultExprUtils` is used to [hasIdentityColumn](#hasIdentityColumn) and [removeDefaultExpressions](#removeDefaultExpressions)
65+
* `ColumnWithDefaultExprUtils` is used to [addDefaultExprsOrReturnConstraints](#addDefaultExprsOrReturnConstraints), [columnHasDefaultExpr](#columnHasDefaultExpr), [hasIdentityColumn](#hasIdentityColumn) and [removeDefaultExpressions](#removeDefaultExpressions)
66+
* `IdentityColumn` is requested to [blockExplicitIdentityColumnInsert](identity-columns/IdentityColumn.md#blockExplicitIdentityColumnInsert), [getIdentityColumns](identity-columns/IdentityColumn.md#getIdentityColumns), [syncIdentity](identity-columns/IdentityColumn.md#syncIdentity), [updateSchema](identity-columns/IdentityColumn.md#updateSchema), [updateToValidHighWaterMark](identity-columns/IdentityColumn.md#updateToValidHighWaterMark)
67+
* `DeltaCatalog` is requested to [alterTable](DeltaCatalog.md#alterTable) and [createDeltaTable](DeltaCatalog.md#createDeltaTable)
68+
* `MergeIntoCommandBase` is requested to [checkIdentityColumnHighWaterMarks](commands/merge/MergeIntoCommandBase.md#checkIdentityColumnHighWaterMarks)
69+
* `WriteIntoDelta` is requested to [writeAndReturnCommitData](commands/WriteIntoDelta.md#writeAndReturnCommitData)
6170

62-
## <span id="removeDefaultExpressions"> Removing Default Expressions
71+
## Remove Default Expressions from Table Schema { #removeDefaultExpressions }
6372

6473
```scala
6574
removeDefaultExpressions(
6675
schema: StructType,
67-
keepGeneratedColumns: Boolean = false): StructType
76+
keepGeneratedColumns: Boolean = false,
77+
keepIdentityColumns: Boolean = false): StructType
6878
```
6979

7080
`removeDefaultExpressions`...FIXME
7181

82+
---
83+
7284
`removeDefaultExpressions` is used when:
7385

74-
* `DeltaLog` is requested to [create a BaseRelation](DeltaLog.md#createRelation) and [createDataFrame](DeltaLog.md#createDataFrame)
86+
* `DeltaTableUtils` is requested to [removeInternalWriterMetadata](DeltaTableUtils.md#removeInternalWriterMetadata)
7587
* `OptimisticTransactionImpl` is requested to [updateMetadataInternal](OptimisticTransactionImpl.md#updateMetadataInternal)
76-
* `DeltaTableV2` is requested for the [tableSchema](DeltaTableV2.md#tableSchema)
77-
* `DeltaDataSource` is requested for the [sourceSchema](spark-connector/DeltaDataSource.md#sourceSchema)
78-
* `DeltaSourceBase` is requested for the [schema](spark-connector/DeltaSource.md#schema)
7988

80-
## <span id="tableHasDefaultExpr"> tableHasDefaultExpr
89+
## tableHasDefaultExpr { #tableHasDefaultExpr }
8190

8291
```scala
8392
tableHasDefaultExpr(

docs/DeltaColumnBuilder.md

+54-7
Original file line numberDiff line numberDiff line change
@@ -28,22 +28,22 @@ import io.delta.tables.DeltaColumnBuilder
2828

2929
## Operators
3030

31-
### <span id="build"> build
31+
### Build StructField { #build }
3232

3333
```scala
3434
build(): StructField
3535
```
3636

37-
Creates a `StructField` ([Spark SQL]({{ book.spark_sql }}/types/StructField))
37+
Creates a `StructField` ([Spark SQL]({{ book.spark_sql }}/types/StructField)) (possibly with some field metadata)
3838

39-
### <span id="comment"> comment
39+
### comment { #comment }
4040

4141
```scala
4242
comment(
4343
comment: String): DeltaColumnBuilder
4444
```
4545

46-
### <span id="dataType"> dataType
46+
### dataType { #dataType }
4747

4848
```scala
4949
dataType(
@@ -52,7 +52,7 @@ dataType(
5252
dataType: String): DeltaColumnBuilder
5353
```
5454

55-
### <span id="generatedAlwaysAs"> generatedAlwaysAs
55+
### generatedAlwaysAs { #generatedAlwaysAs }
5656

5757
```scala
5858
generatedAlwaysAs(
@@ -61,14 +61,46 @@ generatedAlwaysAs(
6161

6262
Registers the [Generation Expression](#generationExpr) of this field
6363

64-
### <span id="nullable"> nullable
64+
### generatedAlwaysAsIdentity { #generatedAlwaysAsIdentity }
65+
66+
```scala
67+
generatedAlwaysAsIdentity(
68+
start: Long,
69+
step: Long): DeltaColumnBuilder
70+
```
71+
72+
Sets the following:
73+
74+
Property | Value
75+
-|-
76+
[identityStart](#identityStart) | `start`
77+
[identityStep](#identityStep) | `step`
78+
[identityAllowExplicitInsert](#identityAllowExplicitInsert) | `false`
79+
80+
### generatedByDefaultAsIdentity { #generatedByDefaultAsIdentity }
81+
82+
```scala
83+
generatedByDefaultAsIdentity(
84+
start: Long,
85+
step: Long): DeltaColumnBuilder
86+
```
87+
88+
Sets the following:
89+
90+
Property | Value
91+
-|-
92+
[identityStart](#identityStart) | `start`
93+
[identityStep](#identityStep) | `step`
94+
[identityAllowExplicitInsert](#identityAllowExplicitInsert) | `true`
95+
96+
### nullable { #nullable }
6597

6698
```scala
6799
nullable(
68100
nullable: Boolean): DeltaColumnBuilder
69101
```
70102

71-
## <span id="generationExpr"> Generation Expression
103+
## Generation Expression { #generationExpr }
72104

73105
```scala
74106
generationExpr: Option[String] = None
@@ -77,3 +109,18 @@ generationExpr: Option[String] = None
77109
`DeltaColumnBuilder` uses `generationExpr` internal registry for the [generatedAlwaysAs](#generatedAlwaysAs) expression.
78110

79111
When requested to [build a StructField](#build), `DeltaColumnBuilder` registers `generationExpr` under [delta.generationExpression](spark-connector/DeltaSourceUtils.md#GENERATION_EXPRESSION_METADATA_KEY) key in the metadata (of this field).
112+
113+
## identityAllowExplicitInsert { #identityAllowExplicitInsert }
114+
115+
```scala
116+
identityAllowExplicitInsert: Option[Boolean] = None
117+
```
118+
119+
`identityAllowExplicitInsert` flag is used to indicate a call to the following methods:
120+
121+
Method | Value
122+
-|-
123+
[generatedAlwaysAsIdentity](#generatedAlwaysAsIdentity) | `false`
124+
[generatedByDefaultAsIdentity](#generatedByDefaultAsIdentity) | `true`
125+
126+
`identityAllowExplicitInsert` is used to [build a StructField](#build).

docs/commands/merge/MergeIntoCommandBase.md

+9
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,15 @@ Used when:
108108

109109
* `MergeIntoCommandBase` is requested to [run](#run)
110110

111+
### checkIdentityColumnHighWaterMarks { #checkIdentityColumnHighWaterMarks }
112+
113+
```scala
114+
checkIdentityColumnHighWaterMarks(
115+
deltaTxn: OptimisticTransaction): Unit
116+
```
117+
118+
`checkIdentityColumnHighWaterMarks`...FIXME
119+
111120
## Implementations
112121

113122
* [MergeIntoCommand](MergeIntoCommand.md)
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# IdentityColumn
2+
3+
## getIdentityInfo { #getIdentityInfo }
4+
5+
```scala
6+
getIdentityInfo(
7+
field: StructField): IdentityInfo
8+
```
9+
10+
`getIdentityInfo`...FIXME
11+
12+
---
13+
14+
`getIdentityInfo` is used when:
15+
16+
* `IdentityColumn` is requested to [copySchemaWithMergedHighWaterMarks](#copySchemaWithMergedHighWaterMarks), [createIdentityColumnGenerationExpr](#createIdentityColumnGenerationExpr), [syncIdentity](#syncIdentity), [updateSchema](#updateSchema), [updateToValidHighWaterMark](#updateToValidHighWaterMark)
17+
* `MergeIntoCommandBase` is requested to [checkIdentityColumnHighWaterMarks](../commands/merge/MergeIntoCommandBase.md#checkIdentityColumnHighWaterMarks)

docs/identity-columns/index.md

+26-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,28 @@
11
# Identity Columns
22

3-
**Identity Columns** is a new feature in Delta Lake 3.3.0 that allows assigning unique values for each record inserted into a table.
3+
**Identity Columns** is a new feature in Delta Lake 3.3.0 that allows assigning unique values for each record writted out into a table (unless users provide values for them explicitly).
4+
5+
Identity Columns feature is supported by delta tables that meet one of the following requirements:
6+
7+
* The tables must be on Writer Version 6
8+
* The table must be on Writer Version 7, and a feature name `identityColumns` must exist in the table protocol's `writerFeatures`.
9+
10+
Identity Columns cannot be specified with a generated column expression (or a `DeltaAnalysisException` is reported).
11+
12+
Identity Columns can only be of `LongType`.
13+
14+
IDENTITY column step cannot be 0 (or a `DeltaAnalysisException` is reported).
15+
16+
Internally, identity columns are columns (fields) with the following `Metadata`:
17+
18+
Key | Value
19+
-|-
20+
[delta.identity.allowExplicitInsert](../spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_ALLOW_EXPLICIT_INSERT) | [identityAllowExplicitInsert](../DeltaColumnBuilder.md#identityAllowExplicitInsert)
21+
[delta.identity.start](../spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_START) | [identityStart](../DeltaColumnBuilder.md#identityStart)
22+
[delta.identity.step](../spark-connector/DeltaSourceUtils.md#IDENTITY_INFO_STEP) | [identityStep](../DeltaColumnBuilder.md#identityStep)
23+
24+
[IdentityColumn](IdentityColumn.md) and [ColumnWithDefaultExprUtils](../ColumnWithDefaultExprUtils.md#isIdentityColumn) utilities are used to work with identity columns.
25+
26+
## Learn More
27+
28+
* [Identity Columns]({{ delta.github }}/PROTOCOL.md#identity-columns) in Delta Lake's table protocol specification

docs/spark-connector/DeltaSourceUtils.md

+8-9
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: DeltaSourceUtils
44

55
# DeltaSourceUtils
66

7-
## <span id="GENERATION_EXPRESSION_METADATA_KEY"><span id="delta.generationExpression"> delta.generationExpression
7+
## <span id="GENERATION_EXPRESSION_METADATA_KEY"> delta.generationExpression { #delta.generationExpression }
88

99
`DeltaSourceUtils` defines `delta.generationExpression` metadata key for the generation expression of a [generated column](../DeltaColumnBuilder.md#generatedAlwaysAs) of a delta table.
1010

@@ -17,31 +17,30 @@ Used when:
1717
* [GeneratedColumn](../generated-columns/GeneratedColumn.md) utility is used to [isGeneratedColumn](../generated-columns/GeneratedColumn.md#isGeneratedColumn) and [getGenerationExpressionStr](../generated-columns/GeneratedColumn.md#getGenerationExpressionStr)
1818
* `SchemaUtils` utility is used to [reportDifferences](../SchemaUtils.md#reportDifferences)
1919

20-
## <span id="IDENTITY_INFO_ALLOW_EXPLICIT_INSERT"><span id="delta.identity.allowExplicitInsert"> delta.identity.allowExplicitInsert
20+
## <span id="IDENTITY_INFO_ALLOW_EXPLICIT_INSERT"> delta.identity.allowExplicitInsert { #delta.identity.allowExplicitInsert }
2121

2222
`DeltaSourceUtils` defines `delta.identity.allowExplicitInsert` metadata key for...FIXME
2323

2424
Used when:
2525

2626
* `ColumnWithDefaultExprUtils` utility is used to [isIdentityColumn](../ColumnWithDefaultExprUtils.md#isIdentityColumn) and [removeDefaultExpressions](../ColumnWithDefaultExprUtils.md#removeDefaultExpressions)
2727

28-
## <span id="IDENTITY_INFO_START"><span id="delta.identity.start"> delta.identity.start
28+
## <span id="IDENTITY_INFO_START"> delta.identity.start { #delta.identity.start }
2929

30-
`DeltaSourceUtils` defines `delta.identity.start` metadata key for...FIXME
30+
`delta.identity.start` table metadata key is used when:
3131

32-
Used when:
33-
34-
* `ColumnWithDefaultExprUtils` utility is used to [isIdentityColumn](../ColumnWithDefaultExprUtils.md#isIdentityColumn) and [removeDefaultExpressions](../ColumnWithDefaultExprUtils.md#removeDefaultExpressions)
32+
* `DeltaColumnBuilder` is requested to [build a StructField](../DeltaColumnBuilder.md#build) (with [identityAllowExplicitInsert](../DeltaColumnBuilder.md#identityAllowExplicitInsert) defined)
33+
* `ColumnWithDefaultExprUtils` is used to [isIdentityColumn](../ColumnWithDefaultExprUtils.md#isIdentityColumn) and [removeDefaultExpressions](../ColumnWithDefaultExprUtils.md#removeDefaultExpressions)
3534

36-
## <span id="IDENTITY_INFO_STEP"><span id="delta.identity.step"> delta.identity.step
35+
## <span id="IDENTITY_INFO_STEP"> delta.identity.step { #delta.identity.step }
3736

3837
`DeltaSourceUtils` defines `delta.identity.step` metadata key for...FIXME
3938

4039
Used when:
4140

4241
* `ColumnWithDefaultExprUtils` utility is used to [isIdentityColumn](../ColumnWithDefaultExprUtils.md#isIdentityColumn) and [removeDefaultExpressions](../ColumnWithDefaultExprUtils.md#removeDefaultExpressions)
4342

44-
## <span id="isDeltaDataSourceName"> isDeltaDataSourceName
43+
## isDeltaDataSourceName { #isDeltaDataSourceName }
4544

4645
```scala
4746
isDeltaDataSourceName(

0 commit comments

Comments
 (0)