Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] partial parse fails when schema file is updated #11363

Open
2 tasks done
yakovlevvs opened this issue Mar 5, 2025 · 0 comments
Open
2 tasks done

[Bug] partial parse fails when schema file is updated #11363

yakovlevvs opened this issue Mar 5, 2025 · 0 comments
Labels
bug Something isn't working triage

Comments

@yakovlevvs
Copy link

yakovlevvs commented Mar 5, 2025

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When i add new model to my dbt project and call dbt parse everything works fine and the model appears in manifest.json file. Then I make any change to the yaml file where a model is described (for example add new column) and run dbt parse again, I get DuplicatePatchPathError:

Compilation Error
  dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
   - models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml

When I change any other files related to this model and keep yaml intouched, partial parsing works fine.
I know that this issue can be avoided by use full parsing instead of partial but dbt projects may be very big and parsing all the project each time we change a single model may be time consuming and costly.

Expected Behavior

Dbt allows to do partial parsing when schema is changed.

Steps To Reproduce

Add the following test_catalog.test_schema.test_model.yml and test_catalog.test_schema.test_model.sql files to your project:

version: 2

models:
  - name: test_catalog.test_schema.test_model
    config:
      alias: test_model
      schema: test_schema
      materialized: table
    columns:
      - name: col1

SELECT 1 as col1

Then parse the project with this command dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug

Then change test_catalog.test_schema.test_model.yml file by adding new column:

version: 2

models:
  - name: test_catalog.test_schema.test_model
    config:
      alias: test_model
      schema: test_schema
      materialized: table
    columns:
      - name: col1
      - name: col2

Then run again dbt parse --profiles-dir . --project-dir ./dbt --log-path ./dbt/logs --target-path ./target --debug

Relevant log output

> dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug
15:39:43  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B239A0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93B236A0>]}
15:39:43  Running with dbt=1.9.2
15:39:43  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'debug': 'True', 'log_path': 'C:\\Users\\60098727\\dp--batch-proc-dlh-dbt-loader\\dags\\dbt\\logs', 'profiles_dir': './dags', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt parse --profiles-dir ./dags --project-dir ./dags/dbt --log-path ./dags/dbt/logs --target-path ./target --debug', 'introspect': 'True', 'log_format': 'default', 'target_path': './target', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}  
15:39:43  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E934612D0>]}
15:39:43  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'f5857103-b702-4f4f-82e5-7333ff40212b', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E93C24AC0>]}
15:39:43  Registered adapter: trino=1.8.1
15:39:43  checksum: 12b12750b70de726cfd89136b8e24afc3f3e77597a97bff40ab7e5f9b39d5e18, vars: {}, profile: , target: , version: 1.9.2
15:39:44  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
15:39:44  Partial parsing: updated file: dlh://models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml
ParsedNodePatch(original_file_path='models\\marts\\dbt_test_marts\\test_catalog.test_schema.test_model.yml', yaml_key='models', package_name='dlh', name='test_catalog.test_schema.test_model', description='', meta={}, docs=Docs(show=True, node_color=None), config={'alias': 'test_model', 'schema': 'test_schema', 'materialized': 'table'}, columns={'col1': ColumnInfo(name='col1', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None), 'col2': ColumnInfo(name='col2', description='', meta={}, data_type=None, constraints=[], quote=None, tags=[], _extra={}, granularity=None)}, access=None, version=None, latest_version=None, constraints=[], deprecation_date=None, time_spine=None)

15:39:44  Encountered an error:
Compilation Error
  dbt found two schema.yml entries for the same resource named test_catalog.test_schema.test_model. Resources and their associated columns may only be described a single time. To fix this, remove one of the resource entries for test_catalog.test_schema.test_model in this file:
   - models\marts\dbt_test_marts\test_catalog.test_schema.test_model.yml

15:39:44  Command `dbt parse` failed at 18:39:44.217854 after 1.15 seconds
15:39:44  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E91A223E0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AEF0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x0000027E9440AF80>]}
15:39:44  Flushing usage events
15:39:44  An error was encountered while trying to flush usage events

Environment

- OS: Debian GNU/Linux 12 (bookworm)
- Python: 3.10.15
- dbt: 1.9.2

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

I did some research and found that if we have patch_path property in a model manifest, we get the DuplicatePatchPathError

if node.patch_path:

But when we add a new model, the parser sets this property from file id which does not allow us patching the manifestin the future:
node.patch_path = patch.file_id

Maybe we can somehow pass a flag of partial parsing to NodePatchParser to conditionally avoid patch_path check and let partial parsing happen? If I remove check of patch_path from lines 848-850 of core/dbt/parser/schemas.py, partial parsing works fine for me.

I use dbt-trino adapter but I believe the problem comes from dbt-core.

@yakovlevvs yakovlevvs added bug Something isn't working triage labels Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant