Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unexpected ref behaviour when model name also exists in an imported package #11351

Open
2 tasks done
TomSteenbergen opened this issue Mar 4, 2025 · 5 comments
Open
2 tasks done
Labels
bug Something isn't working Refinement Maintainer input needed

Comments

@TomSteenbergen
Copy link

TomSteenbergen commented Mar 4, 2025

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

In our organization we have multiple dbt projects. The dbt projects of our central data warehouse team is oftentimes imported as a package in downstream dbt projects. These downstream dbt projects build their dbt models on top of their dbt models.

When the downstream dbt project contains a dbt model with a name that is also used by the upstream dbt project for another model, dbt can incorrectly ref to the wrong model.

For example:

  • We have two projects: project_upstream and project_downstream
    • project_upstream is imported by project_downstream
  • project_upstream has two models: model_x and model_y
    • model_y references model_x
  • project_downstream has one model, also called model_x
    • This model_x references the upstream model_y

When running dbt in project_downstream, e.g. dbt compile, dbt incorrectly thinks that project_upstream.model_y is dependent on project_downstream.model_x instead of project_upstream.model_x, resulting in a cyclic dependency since project_downstream.model_x references project_upstream.model_y.

Expected Behavior

I would have expected that dbt, when using ref with a single argument in project_upstream.model_y, references the dbt model of the same project, i.e. project_upstream.model_x, regardless of whether we are running dbt from project_upstream or project_downstream. Instead, it seems to default to the project from which you are running dbt, even though project_downstream is not listed in packages.yml as a dependency of project_upstream.

Steps To Reproduce

I created a minimal example that you can easily run locally here: https://github.com/TomSteenbergen/dbt-ref-issue

Please clone that repo and refer to that repo's README for the steps to reproduce the issue.

Relevant log output

RuntimeError: Found a cycle: model.project_downstream.model_x --> model.project_upstream.model_y

Environment

- OS: macOS Sequoia 15.3.1
- Python: 3.12.1
- dbt: 1.9.2

Which database adapter are you using with dbt?

I encountered this error both using Snowflake and Postgres.

Additional Context

In the repo with the minimal example (https://github.com/TomSteenbergen/dbt-ref-issue) I also mention some workarounds I tried. One involves overriding the ref macro. While this should solve it, this yields another unexpected error:

Runtime Error
  Compilation Error in test unique_model_x_a (models/model_x/model_x.yml)
    dbt was unable to infer all dependencies for the model "unique_model_x_a".
    This typically happens when ref() is placed within a conditional block.
    
    To fix this, add the following hint to the top of the model "unique_model_x_a":
    
    -- depends_on: {{ ref('project_upstream', 'model_x') }}
    
    > in macro ref (macros/ref.sql)
    > called by test unique_model_x_a (models/model_x/model_x.yml)

Please see the README of the above linked repo for the exact steps to reproduce.

@TomSteenbergen TomSteenbergen added bug Something isn't working triage labels Mar 4, 2025
@dbeatty10
Copy link
Contributor

Thanks for providing the full details of a simple reproducible example @TomSteenbergen !

Did you try a workaround of the following?

  1. Renaming model_x within project_downstream to model_xy
  2. Alias model_xy as model_x

Details

Here's how to do them with your reprex:

mv project_downstream/models/model_x/model_x.sql project_downstream/models/model_x/model_xy.sql
mv project_downstream/models/model_x/model_x.yml project_downstream/models/model_x/model_xy.yml

Make the following changes within project_downstream/models/model_x/model_xy.yml:

models:
  - name: model_xy
    config:
      schema: downstream
      alias: model_x

Result

$ cd project_downstream
$ dbt list -q
project_downstream.model_x.model_xy
project_upstream.model_x.model_x
project_upstream.model_y.model_y
dbt docs generate
dbt docs serve
Image
Image
dbt build
Image

@TomSteenbergen
Copy link
Author

Hi @dbeatty10 ! Thanks for the prompt reply 🙏

Ah yes, alias could be a workaround indeed! Besides needing to make some changes in our downstream queries, a downside I see is that it might be confusing for our engineers that prototyped a SQL query using this table and then want to translate it to a dbt model. In our dbt project the model name is always equal to the table name, so having to refer to it differently does introduce an opportunity of bugs to slip into our dbt models, which is not ideal.

Perhaps this is something we will have to accept, but I do wonder if there is perhaps a way to solve this issue in dbt-core, so that ref does correctly refer to project_upstream.model_x (and without running into this issue). Or is there are good reason for ref to behave like this?

@dbeatty10
Copy link
Contributor

dbeatty10 commented Mar 5, 2025

Can the behavior be changed in dbt-core?

I do wonder if there is perhaps a way to solve this issue in dbt-core, so that ref does correctly refer to project_upstream.model_x (and without running into this issue). Or is there are good reason for ref to behave like this?

There's a couple bits of relevant documentation about what dbt will do when there is a duplicated name [1][2]:

If you ref a resource with a duplicated name, it will resolve to the resource within the same namespace (package or project), or raise an error because of an ambiguous reference. Use two-argument ref to disambiguate references by specifying the namespace.

If you use one-argument ref (just the model_name), dbt will look for a model by that name in the same namespace (package or project); if it finds none, it will raise an error.

A bit ironically, it's ambiguous what "same namespace" means:

  1. Is it the namespace of the root project being executed?
  2. Or the namespace of the project where the reference occurs?

The current behavior appears to align with the 1st possibility rather than the 2nd. I can see that the 2nd is more desirable in your situation (and maybe most/all situations).

I want to run this by some of my colleagues to see what they think. We wouldn't want to make a change that would break other projects.

To be transparent, since it's possible to workaround this, it does lower the chances that we'd make a change in dbt-core.

In the meantime, see below for the workarounds that worked for me (and some that didn't).

Workarounds that worked

Rename the model in the downstream project

Rename project_downstream/models/model_x/model_x.sql to project_downstream/models/model_xy/model_xy.sql and add an alias to the desired relation name:

model_xy.sql

{{ config(schema="downstream", alias="model_x") }}

SELECT 'b' AS col_b
FROM {{ ref("project_upstream", "model_y") }}

Pros: This one always works, even when project_downstream is not editable because it is owned by a different group/organization.

Cons: The downstream author can't simply use their desired relation name as the model name and needs the extra steps above.

Add the package name as the 1st argument of ref in the upstream project

Change this:

SELECT 'a' AS col_a
FROM {{ ref("model_x") }}

To this:

SELECT 'a' AS col_a
FROM {{ ref("project_upstream", "model_x") }}

Pros: downstream users don't need to worry about model name conflicts.

Cons: I know you mentioned this one within your tried workarounds. It will not work when project_downstream is not editable because it is owned by a different group/organization.

Workaround that didn't work

Disabling model_x within project_upstream like this:

models:
  project_upstream:
    model_x:
      +enabled: false

This is because both project_upstream.model_x and project_downstream.model_x need to exist and be enabled.

@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels Mar 5, 2025
@TomSteenbergen
Copy link
Author

TomSteenbergen commented Mar 5, 2025

A bit ironically, it's ambiguous what "same namespace" means:

  1. Is it the namespace of the root project being executed?
  2. Or the namespace of the project where the reference occurs?

Agree with the ambiguity here. I have a strong preference for number 2. Two major reasons:

  • In these scenarios dbt can now compile ref calls in an upstream project to a dbt package that is not even included in its packages.yml file. In the minimal example, project_upstream.model_y is now compiled to ref project_downstream.model_x, even though project_downstream is not included in project_upstream's packages.yml.
  • I would expect the models of an upstream dbt project to compile the same regardless of whether or not dbt compile is run in that project or in a downstream project. Now, running dbt compile in project_upstream yields different SQL code for project_upstream.model_y than running the same command in project_downstream.

Regardless of the option we choose, I think it'd be great if we can also make the docs a bit less ambiguous on this case once we verify the current implementation indeed behaves as number 1.

Thanks a lot @dbeatty10 for running this by your colleagues and taking the time to respond with some workarounds. 🙇 Let me know once you have an update!

@dbeatty10
Copy link
Contributor

Thank you for providing those two reasons for your strong preference! Great insights 💡

it'd be great if we can also make the docs a bit less ambiguous on this case once we verify the current implementation indeed behaves as number 1

💯 Opened this docs issue as a result: dbt-labs/docs.getdbt.com#6982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

2 participants