You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d like to propose that we allow users to define and run YAML-only models in dbt. While SQL is excellent for data transformations, many useful database objects—such as external tables, UDFs, or other engine-specific constructs—are often better described via configuration rather than handwritten SQL. By supporting YAML-only models, dbt could natively materialize these objects based solely on properties defined in YAML.
Cleaner, More Intuitive Configurations:
Currently, if you want to materialize a source or engine-specific object (like an external table), you might need to rely on a “fake” SQL model file with a Jinja comment to trigger SQL generation. This hack is neither user-friendly nor elegant. A YAML-only model would allow users to simply declare object properties without any placeholder SQL.
For instance, by using something like {# Placeholder for the model #} in the placeholder SQL file.
Managing Non-Transformation Objects:
Many of our community members are already experimenting with packages like dbt-external-tables and dbt-flink-adapter to manage objects that aren’t transformations. YAML-only models would extend this capability in a way that feels native to dbt.
Engine-Specific Resource Configuration:
Some data warehouses expose resources (e.g., UDFs, file formats, streams, warehouses) where the “logic” isn’t expressed as SQL transformation but rather as configuration parameters. For these cases, YAML is a natural language for defining the necessary properties, and a YAML-only model could then translate these into the appropriate DDL commands.
Proposed Approach
YAML-Only Model Syntax:
Extend the current YAML model configuration so that a model can be defined entirely in YAML. Instead of embedding SQL, users would specify the properties of the target object (such as columns, table parameters, file format options, etc.). A corresponding macro would then generate the necessary SQL (or use engine-specific APIs) to create or update the object.
On the adapter developer end, a dedicated materialization would be provided just like for any type of model.
Integration into the DAG:
Although these objects may not embody transformation logic, they can still be referenced via ref() in downstream models. This allows dbt to manage dependency order and ensures that all necessary scaffolding exists before running transformation models.
I think that's one of the top advantages of that approach as it enables to create the resources through a regular dbt run / build and make it visible inside dbt docs.
Use Cases
External Tables & Managed Sources:
Materialize external tables with engine-specific configurations directly from YAML. This would simplify the integration of external data sources and align with the practices already seen in community packages.
UDF Management:
For organizations that rely on user-defined functions (especially in environments where procedural SQL is common), YAML-only models could allow UDFs to be defined, versioned, and managed alongside transformation logic.
Other Engine-Specific Constructs:
Objects like file formats, streams, or even certain warehouse properties could be managed in a similar way—defined in YAML and materialized via custom macros or adapter logic. Even rights management could be on the table as long as it would meet a dbt adapter workflow.
Alternatives
One alternative I'm thinking about is to leverage sources to add a new command to materialize the sources defined that way. It could be hooked as well to dbt build like seeds.
How to implement it
I'm not very knowledgeable on how it should be similar to snapshot and seed support that are defined through YML as well.
Practically it would mean modifying ModelParser and/or SchemaParser to update the logic and allow such workflow.
It would likely require to provide a way to allow YAML only from the materialization such as: {% materialization stream, adapter='my_adapter' yaml_only=True %}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Overview
I’d like to propose that we allow users to define and run YAML-only models in dbt. While SQL is excellent for data transformations, many useful database objects—such as external tables, UDFs, or other engine-specific constructs—are often better described via configuration rather than handwritten SQL. By supporting YAML-only models, dbt could natively materialize these objects based solely on properties defined in YAML.
Related existing discussions
Motivation
Cleaner, More Intuitive Configurations:
Currently, if you want to materialize a source or engine-specific object (like an external table), you might need to rely on a “fake” SQL model file with a Jinja comment to trigger SQL generation. This hack is neither user-friendly nor elegant. A YAML-only model would allow users to simply declare object properties without any placeholder SQL.
For instance, by using something like
{# Placeholder for the model #}
in the placeholder SQL file.Managing Non-Transformation Objects:
Many of our community members are already experimenting with packages like dbt-external-tables and dbt-flink-adapter to manage objects that aren’t transformations. YAML-only models would extend this capability in a way that feels native to dbt.
Engine-Specific Resource Configuration:
Some data warehouses expose resources (e.g., UDFs, file formats, streams, warehouses) where the “logic” isn’t expressed as SQL transformation but rather as configuration parameters. For these cases, YAML is a natural language for defining the necessary properties, and a YAML-only model could then translate these into the appropriate DDL commands.
Proposed Approach
Extend the current YAML model configuration so that a model can be defined entirely in YAML. Instead of embedding SQL, users would specify the properties of the target object (such as columns, table parameters, file format options, etc.). A corresponding macro would then generate the necessary SQL (or use engine-specific APIs) to create or update the object.
Here is a practical example:
On the adapter developer end, a dedicated materialization would be provided just like for any type of model.
Although these objects may not embody transformation logic, they can still be referenced via
ref()
in downstream models. This allows dbt to manage dependency order and ensures that all necessary scaffolding exists before running transformation models.I think that's one of the top advantages of that approach as it enables to create the resources through a regular
dbt run / build
and make it visible insidedbt docs
.Use Cases
External Tables & Managed Sources:
Materialize external tables with engine-specific configurations directly from YAML. This would simplify the integration of external data sources and align with the practices already seen in community packages.
UDF Management:
For organizations that rely on user-defined functions (especially in environments where procedural SQL is common), YAML-only models could allow UDFs to be defined, versioned, and managed alongside transformation logic.
Other Engine-Specific Constructs:
Objects like file formats, streams, or even certain warehouse properties could be managed in a similar way—defined in YAML and materialized via custom macros or adapter logic. Even rights management could be on the table as long as it would meet a dbt adapter workflow.
Alternatives
One alternative I'm thinking about is to leverage
sources
to add a new command to materialize the sources defined that way. It could be hooked as well todbt build
like seeds.How to implement it
I'm not very knowledgeable on how it should be similar to snapshot and seed support that are defined through YML as well.
Practically it would mean modifying
ModelParser
and/orSchemaParser
to update the logic and allow such workflow.It would likely require to provide a way to allow YAML only from the materialization such as:
{% materialization stream, adapter='my_adapter' yaml_only=True %}
Am I missing something?
Beta Was this translation helpful? Give feedback.
All reactions