You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dbt is already known to take software (SDLC) principles and apply them to data analytics (Analytics Development Life Cycle). However, the one area that dbt does not take from software/programming is also one of (or THE) the most important paradigms: object oriented programming. That is that dbt objects cannot really inherit or override each other. Sure, models can have versioning, but these are really different objects, not true inheritance. You could argue that dbt models themselves are objects that can be reused and built upon, which is somewhat true, but also falls way short.
Let's take a somewhat simple example, the dbt_hubspot package. Now, let's say this project meets 95% of my Hubspot reporting needs, but I want to make a few changes. I have a decision to make: 1) fork the repo and make my changes, 2) import the package and modify it to meet my needs. In the long term, option number 2 could be great for a few reasons. The parent project might be active and continuously add new/great features that are automatically brought into my project, or perhaps fixes to models too. Also, this approach makes my child project very clean and simple to see where it diverges from the parent. To any software developers out there, this is obvious and paramount to software development.
Here's where dbt falls short and what I should be able to do:
If I define model YAML in the child project for a model with YAML already defined in the parent, rather than throwing an error, it should override any definitions that exist in the child. That could be model or column attributes such as description, or more importantly meta.
Imagine extending a model SQL to add a few new metrics, but keep/override the same model name. Then, just adding those new metrics to the model YAML without having to re-define the parent model YAML, drawing on the point above.
Perhaps the parent project sources.yml definition is near-perfect, but a couple of the tables you want to put an identifier on because they have a different name in your database (e.g. hubspot_customers instead of customers). Or, you simple want to change the freshness config, but keep everything else.
Let's say the parent project defines a lot of useful snapshots, but in my child project I only want a few of those. I should be able to have a simple entry for snapshot name and enabled: false in the child project, or even override other yaml attributes. I suppose this is maybe possible in the project.yml config, but not elsewhere.
There are many more examples of how dbt could benefit from this, but I'll leave it here so it's not overly complex. My company is already doing a lot of this to build dbt data products, but we have to deploy lots of hacks and ugly workarounds. I'm looking forward to the ensuing discussion.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
dbt is already known to take software (SDLC) principles and apply them to data analytics (Analytics Development Life Cycle). However, the one area that dbt does not take from software/programming is also one of (or THE) the most important paradigms: object oriented programming. That is that dbt objects cannot really inherit or override each other. Sure, models can have versioning, but these are really different objects, not true inheritance. You could argue that dbt models themselves are objects that can be reused and built upon, which is somewhat true, but also falls way short.
Let's take a somewhat simple example, the dbt_hubspot package. Now, let's say this project meets 95% of my Hubspot reporting needs, but I want to make a few changes. I have a decision to make: 1) fork the repo and make my changes, 2) import the package and modify it to meet my needs. In the long term, option number 2 could be great for a few reasons. The parent project might be active and continuously add new/great features that are automatically brought into my project, or perhaps fixes to models too. Also, this approach makes my child project very clean and simple to see where it diverges from the parent. To any software developers out there, this is obvious and paramount to software development.
Here's where dbt falls short and what I should be able to do:
enabled: false
in the child project, or even override other yaml attributes. I suppose this is maybe possible in the project.yml config, but not elsewhere.There are many more examples of how dbt could benefit from this, but I'll leave it here so it's not overly complex. My company is already doing a lot of this to build dbt data products, but we have to deploy lots of hacks and ugly workarounds. I'm looking forward to the ensuing discussion.
Beta Was this translation helpful? Give feedback.
All reactions