You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One use-case for this API is to simplify operations in cases where the compactor can determine the old "high watermark" (i.e. latest inclusive source delta stream position that has been merged into the compacted result set current written to the catalog) automatically from a previously compacted partition used as a rebase source. A rebase like this is typically the first step in migrating from an existing copy-on-write compactor over to the DeltaCAT compactor.
Right now we're lacking any official or consistent specification of how to do this across various data catalogs, but we should consider adding a storage API that simply agrees to return the latest copy-on-write compaction high-watermark for any partition locator, and leave it up to the catalog to determine whether they have a supported implementation for this API.
Our current compaction rebase API just defers to the simple answer of "the caller will figure out the high watermark of the previously compacted partition somehow" over the more user-friendly ideal of "the compactor will figure this out for you as long as the catalog has implemented an API to automatically retrieve the high watermark."
One use-case for this API is to simplify operations in cases where the compactor can determine the old "high watermark" (i.e. latest inclusive source delta stream position that has been merged into the compacted result set current written to the catalog) automatically from a previously compacted partition used as a rebase source. A rebase like this is typically the first step in migrating from an existing copy-on-write compactor over to the DeltaCAT compactor.
Right now we're lacking any official or consistent specification of how to do this across various data catalogs, but we should consider adding a storage API that simply agrees to return the latest copy-on-write compaction high-watermark for any partition locator, and leave it up to the catalog to determine whether they have a supported implementation for this API.
Our current compaction rebase API just defers to the simple answer of "the caller will figure out the high watermark of the previously compacted partition somehow" over the more user-friendly ideal of "the compactor will figure this out for you as long as the catalog has implemented an API to automatically retrieve the high watermark."
Originally posted by @pdames in #70 (comment)
The text was updated successfully, but these errors were encountered: