Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: finalize clickhouse engine adapter #3125

Merged
merged 23 commits into from
Sep 18, 2024
Merged

Feat: finalize clickhouse engine adapter #3125

merged 23 commits into from
Sep 18, 2024

Conversation

treysp
Copy link
Contributor

@treysp treysp commented Sep 12, 2024

This PR finalizes the Clickhouse engine adapter:

  • Adds SCD model kind support
  • Adds ability to pass arbitrary settings to client connection
  • Adds docs

Context and implementation details

Joins and NULLs

  • Clickhouse defaults to filling empty cells with a datatype-specific default (e.g., 0 for integer columns).
  • The SCD and table diff queries SQLMesh builds require that we change that behavior to fill with NULLs.
  • We do that by injecting SETTINGS join_use_nulls = 1 into the query
  • SCD detail:
    • The original user query is embedded in a CTE.
    • Query settings are dynamically scoped, so our setting on the outer query will apply to the user query CTE.
    • If join_use_nulls = 0 on the CH server, we inject join_use_nulls = 0 into the CTE to preserve behavior expected by user.

Connection settings

  • Following dbt-clickhouse (maintained by Clickhouse), we pass these settings to the connection:
    • Always
      • mutations_sync = "2"
      • insert_distributed_sync" = "1"
    • When running in cluster or cloud modes
      • database_replicated_enforce_synchronous_settings = "1"
      • insert_quorum = "auto"

Storage format

  • Each CH table must have a "table engine" (MergeTree by default)
  • Users specify a table engine in the MODEL DDL storage_format key
  • Specification may be a function call, so we now generate the value's SQL if the value is an Expression other than Literal or Identifier
    • Generated Literal/Identifier SQL may include quotes, and we want to defer normalization until later

@treysp treysp requested a review from a team September 12, 2024 21:46
@erindru
Copy link
Collaborator

erindru commented Sep 13, 2024

Awesome work! I bet you learned way more than you wanted to about the Clickhouse internals

Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @treysp!

docs/integrations/engines/clickhouse.md Show resolved Hide resolved
docs/integrations/engines/clickhouse.md Show resolved Hide resolved
sqlmesh/core/engine_adapter/clickhouse.py Outdated Show resolved Hide resolved
sqlmesh/core/engine_adapter/clickhouse.py Outdated Show resolved Hide resolved
sqlmesh/core/engine_adapter/clickhouse.py Outdated Show resolved Hide resolved
sqlmesh/core/engine_adapter/clickhouse.py Outdated Show resolved Hide resolved
sqlmesh/core/model/meta.py Outdated Show resolved Hide resolved
sqlmesh/utils/date.py Show resolved Hide resolved
@treysp treysp force-pushed the trey/improve-ch-adapter branch 4 times, most recently from 626240e to 8f1636d Compare September 17, 2024 19:47
@treysp treysp force-pushed the trey/improve-ch-adapter branch 2 times, most recently from 6d9243f to dfa9b97 Compare September 18, 2024 14:53
@treysp treysp merged commit fbf941b into main Sep 18, 2024
23 checks passed
@treysp treysp deleted the trey/improve-ch-adapter branch September 18, 2024 22:15
@treysp treysp changed the title Feat!: finalize clickhouse engine adapter Feat: finalize clickhouse engine adapter Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants