Skip to content

Tools with batteries included to extract CTE from a large SQL query and split them iteratively into dbt models.

License

Notifications You must be signed in to change notification settings

piotr-yuxuan/cte2dbt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

97 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

cte2dbt πŸš€

PyPI - Downloads PyPI - Status GitHub License GitHub Issues

Overview

This small Python module transforms large SQL queries with multiple Common Table Expressions (CTE) into modular, reusable dbt models.

If you have ever struggled with maintaining long, monolithic SQL queries filled with CTE, this tool is for you! cte2dbt automates the extraction of CTE and converts them into structured dbt models, preserving their dependencies and making them easier to test, document, and reuse.

Why use cte2dbt?

  • βœ… Automates SQL-to-dbt migration – No more manually splitting queries into models.
  • βœ… Preserves dependencies – Ensures each dbt model is built in the correct order.
  • βœ… Enables testing & modularisation – Improves maintainability and performance.
  • βœ… Customisable transformation functions – Tailor how CTE and source tables are processed.
  • βœ… Iterable design – Process, validate, or visualise models however you like.

Example Usage

Transforming a SQL query into dbt models

1️⃣ Import cte2dbt and sqlglot

import com.github.piotr_yuxuan.cte2dbt as cte2dbt
from sqlglot import parse_one

2️⃣ Define a SQL query

sql_query = """
WITH
  customers_cte AS (
    SELECT id, name FROM customers
  ),
  orders_cte AS (
    SELECT c.id, o.amount
    FROM customers_cte AS c
    JOIN prod.retails.orders AS o ON c.id = o.customer_id
  )
SELECT * FROM orders_cte;
"""

3️⃣ Define dbt transformation functions

to_dbt_ref_block = lambda name: f"{{{{ ref('{name}') }}}}"
to_dbt_source_block = lambda table: f"{{{{ source('{table.db}', '{table.name}') }}}}"

4️⃣ Initialise the provider

provider = cte2dbt.Provider(
    model_name="final_model",
    expr=parse_one(sql_query),
    to_dbt_ref_block=to_dbt_ref_block,
    to_dbt_source_block=to_dbt_source_block
)

5️⃣ Iterate over dbt models in execution order

The order guarantees that current model only relies on models that came earlier in the iteration.

for model_name, model_expr in provider.get_dbt_models():
    print(f"-- Model: {model_name}")
    print(model_expr.sql(pretty=True))

Output:

-- Model: cte1
SELECT
  id,
  name
FROM {{ source('my_source', 'customers') }} AS customers

-- Model: cte2
SELECT
  cte1.id,
  orders.amount
FROM {{ ref('cte1') }} AS cte1
JOIN {{ source('my_source', 'orders') }} AS orders
  ON cte1.id = orders.customer_id

-- Model: final_model_name
SELECT
  *
FROM {{ ref('cte2') }} AS cte2

6️⃣ Generate the model dependency graph

print(provider.model_dependencies())
# Output:
# {'customers_cte': {'customers'},
#  'orders_cte': {'customers_cte', 'prod.retails.orders'},
#  'final_model': {'orders_cte'}}

Installation

Install cte2dbt using Poetry:

poetry add cte2dbt

Use Cases

Beyond just transforming SQL queries into dbt models, cte2dbt provides an iterable interface that unlocks multiple possibilities:

πŸ”Ή Run SQL transformations dynamically

Iterate over models and execute each as a temporary table:

for model_name, model_expr in provider.iter_dbt_models():
    conn.execute(f"CREATE TEMPORARY TABLE {model_name} AS {model_expr.sql()}")

πŸ”Ή Inspect intermediate data structures

Use DESCRIBE TABLE or DESCRIBE RESULT LAST_QUERY_ID()to log or visualise schema changes:

for model_name, _ in provider.iter_dbt_models():
    print(conn.execute(f"DESCRIBE TABLE {model_name}").fetchall())

πŸ”Ή Validate data consistency

Compare model output with existing tables to measure similarity:

for model_name, model_expr in provider.iter_dbt_models():
    similarity_score = compare_with_reference(conn, model_expr.sql(), reference_table)
    print(f"{model_name}: similarity {similarity_score}%")

πŸ”Ή Generate dependency graphs

Understand how changes in one CTE ripple through your final model:

import networkx as nx
import matplotlib.pyplot as plt

graph = nx.DiGraph(provider.model_dependencies())
nx.draw(graph, with_labels=True)
plt.show()

πŸ”Ή Export models to files

Save each model as a .sql file with a structured naming strategy:

for model_name, model_expr in provider.iter_dbt_models():
    with open(f"models/{model_name}.sql", "w") as f:
        f.write(model_expr.sql(pretty=True))

Extract column lists and generate corresponding .yml files:

for model_name, model_expr in provider.iter_dbt_models():
    columns = extract_columns(model_expr)
    write_yaml(f"models/{model_name}.yml", {"columns": columns})

How It Works

πŸ“Œ SQL Parsing & CTE Extraction

cte2dbt leverages sqlglot to parse SQL queries at the token level. It then identifies CTE, source tables, and their dependencies.

πŸ“Œ Modular dbt Model Generation

  • CTE are transformed into separate dbt models;
  • Source tables are converted into source() blocks;
  • CTE references are replaced with ref() calls;
  • Execution order is preserved to guarantee correctness.

πŸ“Œ Fully Flexible API

Rather than enforcing a rigid output format, cte2dbt gives you full control over how models are transformed, stored, or executed.

Technical Details

  • Uses sqlglot for robust SQL parsing;
  • Implements dependency resolution to determine the correct order of model execution;
  • Supports fully customisable transformation functions;
  • Can be used programmatically or integrated into dbt workflows.

πŸš€ Conclusion

cte2dbt makes SQL-to-dbt migrations effortless, ensuring cleaner, modular, and testable SQL transformations. Whether you want to refactor legacy queries, improve maintainability, or generate structured dbt models automatically, this tool provides an elegant and flexible solution.

Give it a try and simplify your dbt workflow today! 🎯

About

Tools with batteries included to extract CTE from a large SQL query and split them iteratively into dbt models.

Topics

Resources

License

Stars

Watchers

Forks

Languages