YAML-Only Models in dbt for Managing Non-Transformation Objects #11288

Kayrnt · 2025-02-09T15:43:26Z

Kayrnt
Feb 9, 2025

Overview

I’d like to propose that we allow users to define and run YAML-only models in dbt. While SQL is excellent for data transformations, many useful database objects—such as external tables, UDFs, or other engine-specific constructs—are often better described via configuration rather than handwritten SQL. By supporting YAML-only models, dbt could natively materialize these objects based solely on properties defined in YAML.

Related existing discussions

Motivation

Cleaner, More Intuitive Configurations:
Currently, if you want to materialize a source or engine-specific object (like an external table), you might need to rely on a “fake” SQL model file with a Jinja comment to trigger SQL generation. This hack is neither user-friendly nor elegant. A YAML-only model would allow users to simply declare object properties without any placeholder SQL.
For instance, by using something like {# Placeholder for the model #} in the placeholder SQL file.
Managing Non-Transformation Objects:
Many of our community members are already experimenting with packages like dbt-external-tables and dbt-flink-adapter to manage objects that aren’t transformations. YAML-only models would extend this capability in a way that feels native to dbt.
Engine-Specific Resource Configuration:
Some data warehouses expose resources (e.g., UDFs, file formats, streams, warehouses) where the “logic” isn’t expressed as SQL transformation but rather as configuration parameters. For these cases, YAML is a natural language for defining the necessary properties, and a YAML-only model could then translate these into the appropriate DDL commands.

Proposed Approach

YAML-Only Model Syntax:
Extend the current YAML model configuration so that a model can be defined entirely in YAML. Instead of embedding SQL, users would specify the properties of the target object (such as columns, table parameters, file format options, etc.). A corresponding macro would then generate the necessary SQL (or use engine-specific APIs) to create or update the object.

Here is a practical example:

version: 2

models:
  - name: shoe_orders
    config:
      materialized: stream
      parameters:
        topic: shoe_orders
        'value.format': JSON
    columns:
      - name: order_id
        type: bigint
        nullable: false
      - name: customer_id
        type: varchar
      - name: product_id
        type: varchar
      - name: ts
        type: timestamp

On the adapter developer end, a dedicated materialization would be provided just like for any type of model.

Integration into the DAG:
Although these objects may not embody transformation logic, they can still be referenced via ref() in downstream models. This allows dbt to manage dependency order and ensures that all necessary scaffolding exists before running transformation models.
I think that's one of the top advantages of that approach as it enables to create the resources through a regular dbt run / build and make it visible inside dbt docs.

Use Cases

External Tables & Managed Sources:
Materialize external tables with engine-specific configurations directly from YAML. This would simplify the integration of external data sources and align with the practices already seen in community packages.
UDF Management:
For organizations that rely on user-defined functions (especially in environments where procedural SQL is common), YAML-only models could allow UDFs to be defined, versioned, and managed alongside transformation logic.
Other Engine-Specific Constructs:
Objects like file formats, streams, or even certain warehouse properties could be managed in a similar way—defined in YAML and materialized via custom macros or adapter logic. Even rights management could be on the table as long as it would meet a dbt adapter workflow.

Alternatives

One alternative I'm thinking about is to leverage sources to add a new command to materialize the sources defined that way. It could be hooked as well to dbt build like seeds.

How to implement it

I'm not very knowledgeable on how it should be similar to snapshot and seed support that are defined through YML as well.
Practically it would mean modifying ModelParser and/or SchemaParser to update the logic and allow such workflow.
It would likely require to provide a way to allow YAML only from the materialization such as:
{% materialization stream, adapter='my_adapter' yaml_only=True %}

Am I missing something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YAML-Only Models in dbt for Managing Non-Transformation Objects #11288

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

YAML-Only Models in dbt for Managing Non-Transformation Objects #11288

Kayrnt Feb 9, 2025

Overview

Related existing discussions

Motivation

Proposed Approach

Use Cases

Alternatives

How to implement it

Replies: 0 comments

Kayrnt
Feb 9, 2025