Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DeltaCAT Storage API to Automatically Return an Input Table's Compaction High Watermark #77

Open
pdames opened this issue Feb 9, 2023 · 0 comments
Assignees

Comments

@pdames
Copy link
Member

pdames commented Feb 9, 2023

One use-case for this API is to simplify operations in cases where the compactor can determine the old "high watermark" (i.e. latest inclusive source delta stream position that has been merged into the compacted result set current written to the catalog) automatically from a previously compacted partition used as a rebase source. A rebase like this is typically the first step in migrating from an existing copy-on-write compactor over to the DeltaCAT compactor.

Right now we're lacking any official or consistent specification of how to do this across various data catalogs, but we should consider adding a storage API that simply agrees to return the latest copy-on-write compaction high-watermark for any partition locator, and leave it up to the catalog to determine whether they have a supported implementation for this API.

Our current compaction rebase API just defers to the simple answer of "the caller will figure out the high watermark of the previously compacted partition somehow" over the more user-friendly ideal of "the compactor will figure this out for you as long as the catalog has implemented an API to automatically retrieve the high watermark."

Originally posted by @pdames in #70 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants