Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enable Dynamic Target Schema Inference Based on Model Build State #11377

Open
3 tasks done
vbgcwood opened this issue Mar 8, 2025 · 1 comment
Open
3 tasks done
Labels
enhancement New feature or request triage

Comments

@vbgcwood
Copy link

vbgcwood commented Mar 8, 2025

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Summary:

It would be beneficial if dbt core could automatically differentiate between models that are being built (e.g., new or modified) and those that are assumed to pre-exist, while determining the target schema to use. This feature request aims to allow dbt to dynamically infer the target schema at runtime based on the model’s build state, rather than requiring explicit static configuration in every case.

Background & Problem Statement:

Currently, when using state-based selection flags such as --select state:new state:modified, dbt builds the selected models while assuming the remaining models already exist. However, all model references resolve to the statically configured {target_schema}_{custom_schema) for the model. This approach means that we are not able to dynamically configure dbt to deploy newly built models into a different environment.

The purpose of allowing dynamically configured separate schemas for "models-to-be-built" and "models-expected-to-pre-exist" during dbt build --select ... is to allow users to choose to isolate their new builds in an isolated environment that can be easily tested and destroyed, without requiring that all models be built in this environment--also without requiring the schemas be explicitly declared for every model to implement this practice.

I utilize a setup where my dbt_profile.yml file is dynamically generated within CI/CD pipelines to allow for feature branch deployments to isolate their work and perform a full suite of tests. I don't have a way to avoid building all upstream dependancies for models-to-be-tested in this pipeline, because they're dependancies of the models I am deploying. This can become expensive when we're dealing with a lot of dependencies.

Proposed Feature:

Introduce a dynamic schema resolution mechanism in dbt core that:

Infers Model State: Automatically identifies whether a model is being built or assumed to pre-exist based on state-based selection criteria.
Dynamic Schema Assignment: Directs models identified as “to-be-built” into a designated new schema (e.g., {target_schema}_{custom_schema) as is currently the case), while referencing pre-existing models from a separate schema that is explicitly configured for models that won't be built (also using {target_schema}_{custom_schema), but with a different target_schema).
Configuration Simplicity: Allow a second target_schema for be defined for "pre-existing-models".

Benefits:

Optimized Build Times: By avoiding the unnecessary rebuild of models that are assumed to pre-exist, users can save time and computational resources.
Enhanced Testing Flexibility: New and modified models can be isolated in a dedicated schema for testing without impacting the stable, production-level schema.
Streamlined Workflow: This dynamic approach removes the need for manual schema configuration on a per-model basis, reducing potential errors and maintenance overhead.

Use Case Example:

Assuming in the following profiles configuration:

# dbt_profiles.yml
default:
  outputs:
    athena:
      database: AwsDataCatalog
      schema: feature_test_market_model_changes
      dependency_schema: dev
      table_type: iceberg
      threads: 10
      type: athena
  target: athena

When running a command like dbt run --select state:new state:modified, dbt would:

  1. Check if the following conditions are met:
    • A dependency_schema is defined for the target connector.
    • Any dependency models are being referenced in the build, but not being built,
  2. If the above conditions are all true, then build all selected models with the feature_test_market_model_changes target schema, while referencing all dependency models with the dev target schema.

Describe alternatives you've considered

Manually configure all target schemas for every CI/CD deployment, based on which models are being built.

Who will this benefit?

All dbt users who would like to isolate their new models better. This provides an intuitive and streamlined way to build and destroy models efficiently, even on top of an existing warehouse. You can simply drop the schema.

Are you interested in contributing this feature?

I'd love to, but have never contributed to OSS before. Not sure where to start.

Anything else?

No response

@vbgcwood vbgcwood added enhancement New feature or request triage labels Mar 8, 2025
@vbgcwood
Copy link
Author

vbgcwood commented Mar 9, 2025

Something like selected_resources could work for this. https://docs.getdbt.com/reference/dbt-jinja-functions/selected_resources

But, it would need to provide a list of resources that are in scope during the current dbt invocation. So, omitting --select should result in all models being listed. Selecting a particular directory like --select marts in this repo would have all the marts models populate in the list, while all the staging models are omitted.

Then something like this in the generate_schema_name macro:

		{% if node.unique_id not in selected_resources %}
			{# Model is not being built - use the dependency schema #}
			{% set schema_prefix = var('dependency_schema', default_schema_prefix) %}
		{% else %}
			{# Model is being built - use the default schema #}
			{% set schema_prefix = default_schema_prefix %}
		{% endif %}

I ran some tests with the selected_resources variable but wasn't able to get it to work. I'm not sure it meets the use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant