How to efficiently implement WAP (Write-Audit-Publish) with Delta? #4135
Unanswered
dfustes
asked this question in
Ask the Authors
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear Delta Lake authors. First of all thank you for your nice work on this project.
We are introducing some Data Quality checks in our pipelines, and we don't want to publish the changes on a given table until the Data Quality checks have been passed. To do so, we have a WAP process in place, but it's not very efficient as we need to create a "public" replica of the tables, where we merge the changes after passing the checks.
We are wondering if we could leverage the Shallow clone feature in Delta Lake. We could, for example, clone the original table, apply the changes, run the checks and then merge the changes into the clone. However, this operation is still not very fast. In addition, our consumers are reading the public tables incrementally, leveraging the ChangeDataFeed, which would not be supported if they consume the cloned tables.
After some experiments, we think that we could create our own way of "cloning" the original tables by just copying the
delta_log
folder incrementally as new versions are validated in the original table, tweaking the file paths to point to the original table. Do you think that this approach is feasible? Would it be compliant with the Delta protocol now and in the future?Kind Regards
Beta Was this translation helpful? Give feedback.
All reactions