This small Python module transforms large SQL queries with multiple Common Table Expressions (CTE) into modular, reusable dbt models.
If you have ever struggled with maintaining long, monolithic SQL
queries filled with CTE, this tool is for you! cte2dbt
automates the
extraction of CTE and converts them into structured dbt models,
preserving their dependencies and making them easier to test,
document, and reuse.
- β Automates SQL-to-dbt migration β No more manually splitting queries into models.
- β Preserves dependencies β Ensures each dbt model is built in the correct order.
- β Enables testing & modularisation β Improves maintainability and performance.
- β Customisable transformation functions β Tailor how CTE and source tables are processed.
- β Iterable design β Process, validate, or visualise models however you like.
import com.github.piotr_yuxuan.cte2dbt as cte2dbt
from sqlglot import parse_one
sql_query = """
WITH
customers_cte AS (
SELECT id, name FROM customers
),
orders_cte AS (
SELECT c.id, o.amount
FROM customers_cte AS c
JOIN prod.retails.orders AS o ON c.id = o.customer_id
)
SELECT * FROM orders_cte;
"""
to_dbt_ref_block = lambda name: f"{{{{ ref('{name}') }}}}"
to_dbt_source_block = lambda table: f"{{{{ source('{table.db}', '{table.name}') }}}}"
provider = cte2dbt.Provider(
model_name="final_model",
expr=parse_one(sql_query),
to_dbt_ref_block=to_dbt_ref_block,
to_dbt_source_block=to_dbt_source_block
)
The order guarantees that current model only relies on models that came earlier in the iteration.
for model_name, model_expr in provider.get_dbt_models():
print(f"-- Model: {model_name}")
print(model_expr.sql(pretty=True))
Output:
-- Model: cte1
SELECT
id,
name
FROM {{ source('my_source', 'customers') }} AS customers
-- Model: cte2
SELECT
cte1.id,
orders.amount
FROM {{ ref('cte1') }} AS cte1
JOIN {{ source('my_source', 'orders') }} AS orders
ON cte1.id = orders.customer_id
-- Model: final_model_name
SELECT
*
FROM {{ ref('cte2') }} AS cte2
print(provider.model_dependencies())
# Output:
# {'customers_cte': {'customers'},
# 'orders_cte': {'customers_cte', 'prod.retails.orders'},
# 'final_model': {'orders_cte'}}
Install cte2dbt using Poetry:
poetry add cte2dbt
Beyond just transforming SQL queries into dbt models, cte2dbt
provides an iterable interface that unlocks multiple possibilities:
Iterate over models and execute each as a temporary table:
for model_name, model_expr in provider.iter_dbt_models():
conn.execute(f"CREATE TEMPORARY TABLE {model_name} AS {model_expr.sql()}")
Use DESCRIBE TABLE
or DESCRIBE RESULT LAST_QUERY_ID()
to log or
visualise schema changes:
for model_name, _ in provider.iter_dbt_models():
print(conn.execute(f"DESCRIBE TABLE {model_name}").fetchall())
Compare model output with existing tables to measure similarity:
for model_name, model_expr in provider.iter_dbt_models():
similarity_score = compare_with_reference(conn, model_expr.sql(), reference_table)
print(f"{model_name}: similarity {similarity_score}%")
Understand how changes in one CTE ripple through your final model:
import networkx as nx
import matplotlib.pyplot as plt
graph = nx.DiGraph(provider.model_dependencies())
nx.draw(graph, with_labels=True)
plt.show()
Save each model as a .sql
file with a structured naming strategy:
for model_name, model_expr in provider.iter_dbt_models():
with open(f"models/{model_name}.sql", "w") as f:
f.write(model_expr.sql(pretty=True))
Extract column lists and generate corresponding .yml files:
for model_name, model_expr in provider.iter_dbt_models():
columns = extract_columns(model_expr)
write_yaml(f"models/{model_name}.yml", {"columns": columns})
cte2dbt
leverages sqlglot to
parse SQL queries at the token level. It then identifies CTE, source
tables, and their dependencies.
- CTE are transformed into separate dbt models;
- Source tables are converted into source() blocks;
- CTE references are replaced with ref() calls;
- Execution order is preserved to guarantee correctness.
Rather than enforcing a rigid output format, cte2dbt gives you full control over how models are transformed, stored, or executed.
- Uses sqlglot for robust SQL parsing;
- Implements dependency resolution to determine the correct order of model execution;
- Supports fully customisable transformation functions;
- Can be used programmatically or integrated into dbt workflows.
cte2dbt
makes SQL-to-dbt migrations effortless, ensuring cleaner,
modular, and testable SQL transformations. Whether you want to
refactor legacy queries, improve maintainability, or generate
structured dbt models automatically, this tool provides an elegant and
flexible solution.
Give it a try and simplify your dbt workflow today! π―