[HUDI-8939] Fixing concurrency handling during upgrade #12737

nsivabalan · 2025-01-30T01:33:20Z

Change Logs

Fixing concurrency handling during upgrade and downgrade.

Problem scenario:

L1 df.write.format(hudi).save(path) -> say user configured zookeeper based lock provider. 
   .
L10   . writeClient.upsert 
L11         doInitTable
L12             lock using zookeeper based lock provider
L13                   upgrade
L14                            
L15                                 rollback failed writes
L16                                 full table compaction 
L17                                 both of the above operations will not try to re-acquire the lock again using user configured LP(i.e zookeeper based lock provider). this contends w/ the same lock taken at L12 and times out eventually. 

L18               unlock
L19        continue w/ upsert. 
L20       .
.

Root cause is the re-entrant locking.

We are making 3 fixes in this patch. w/ all of the fixes, the control flow is as follows:

L1 df.write.format(hudi).save(path) -> say user configured zookeeper based lock provider. 
   .
L10   . writeClient.upsert 
L11         doInitTable
L12             lock using zookeeper based lock provider
L13                   upgrade
L14                            override lock provider to NoopLockProvider, disable auto adjust lock configs, disable reuse time generator
L15                                 rollback failed writes
L16                                 full table compaction 
L17                                 both of the above operations will not try to re-acquire the lock again using user configured LP(i.e zookeeper based lock provider). here we use NoopLockProvider. 
L18               unlock
L19        continue w/ upsert. 
L20       .
.

Dissecting each fix:

removing lock provider from upgrade code path.

after this fix, the control flow is as follows:

L1 df.write.format(hudi).save(path) -> say user configured zookeeper based lock provider. 
   .
L10   . writeClient.upsert 
L11         doInitTable
L12             lock using zookeeper based lock provider
L13                   upgrade
L14                            remove lock provider configs. // fix.
L15                                 rollback failed writes
L16                                 full table compaction 
L17                                 // but we did notice again re-entrancy was happening 
L18               unlock
L19        continue w/ upsert. 
L20       .
.

We were automatically overriding the LockProvider to InProcessLock provider if its a single writer and all inline table services (at L15). So, essentially if we configure InProcessLockProvider from end user standpoint, we were hitting re-entrant locks. Fix: Made this logic to be guarded by the auto adjust lock config (hoodie.auto.adjust.lock.configs). So, no auto adjustment of lock configs will take place for straight forward out of the box use-case.

Again, we were still hitting the exception.

TransactionManager is the entity used for regular locking mechanism (lock while scheduling compaction, locking while writing to MDT etc) in hudi. But the new instant time generation takes a diff route and has its down lock provider. So, both of them have different ways to deduce the default.

TransactionManager was explicitly setting the lock provider to InProcessLockProvider if there is no LockProvider configured. If user configures explicitly, txnManager re-uses the same.

Introducing NoopLockProvider which just allows anyone to acquire the lock (synonymous to single writer). So, for UpgradeHandler code blocks, we override the lock provider to use NoopLockProvider.

after this fix, the control flow is as follows:

L1 df.write.format(hudi).save(path) -> say user configured zookeeper based lock provider. 
   .
L10   . writeClient.upsert 
L11         doInitTable
L12             lock using zookeeper based lock provider
L13                   upgrade
L14                            override lock provider to NoopLockProvider
L15                                 rollback failed writes
L16                                 full table compaction 
L17                                 // we were again hitting the issue still.
L18               unlock
L19        continue w/ upsert. 
L20       .
.

Even w/ above fix, we were still hitting re-entrant locks.

We use a cache for TimeGenerator instances, one per table path. Again, in the above flow of events, even if we override the LP to NoopLockProvider, new instant time generation was using the same lock provider as user configured one. Reason was, we re-use the same TimeGenerator which was generated before the upgrade call. So, introduced an internal config named "_hoodie.time.generator.reuse.enable". By default we enable it. In the upgrade flows, we override it to false. And so, we create new instance of TimeGenerator. This means that, the lock provider used by the TimeGenerator will use NoopLockProvider as configured.

So, w/ all of above 3 fixes, our solution is as follows

L1 df.write.format(hudi).save(path) -> say user configured zookeeper based lock provider. 
   .
L10   . writeClient.upsert 
L11         doInitTable
L12             lock using zookeeper based lock provider
L13                   upgrade
L14                            override lock provider to NoopLockProvider, disable auto adjust lock configs, disable reuse time generator
L15                                 rollback failed writes
L16                                 full table compaction 
L17                                 both of the above operations will not try to re-acquire the lock again using user configured LP(i.e zookeeper based lock provider). here we use NoopLockProvider. 
L18               unlock
L19        continue w/ upsert. 
L20       .
.

Impact

Seamless upgrade irrespective of lock provider used.

Risk level (write none, low medium or high below)

medium

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

codope

Yet to go review the code. But, wondering if we need more complexty of new lock provider and new configs. I have a few high level questions:

Why do we necessarily need a new NoopLockProvider if we are removing the lock configs during upgrade? Shouldn't the txn manager for the upgrade write client understand based on its write config that lock is not required? Conceptually, just removing lock configs and disabling auto adjustment should be enough.
Why do we need a new config to decide whether or not to reuse time generator w/ or w/o lock? TimeGenerator API takes a flag to indicate whether locking is required or not. So, if the existing configs are being propagated properly and all callers of TimeGenerator API are passing the flag based on the config, then I don't think there is a need for another config.
I think the goal was to identify the malicious caller, as we discussed, but we still don't know that right?

codope

Discussed offline, and as such, the patch is good to unblock 1.0.1. But, there are still some open questions:

Does TimeGenerator API always need a lock provider, even when there is no real lock requirement (say single writer, all inline table service)?
Is upgrade (esp rollbackFailedWritesAndCompact) the only path where this issue happens? For COW tables wit explicit InProcessLockProvider configured, I have noticed that testPartitionFieldsWithUpgrade fails due to NPE after upgrade. This patch has somehow fixed it, but I don't have good understanding of what exactly was causing that NPE. For ref, draft patch based off of current master that repro the COW issue - [DO NOT MERGE] Investigate COW failure for null lock provider #12739

Let's revisit the above soon.

hudi-bot · 2025-01-30T08:19:43Z

CI report:

0093e06 UNKNOWN
bf6592d Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

* minor fixes to upgrade path * Fixes for concurrency handling during upgrade * fix build failure --------- Co-authored-by: Sagar Sumit <[email protected]>

minor fixes to upgrade path

df11bad

nsivabalan force-pushed the concurrencyFixUpgrade7To8 branch from 032cd44 to 0093e06 Compare January 30, 2025 01:40

Fixes for concurrency handling during upgrade

c0344ee

nsivabalan force-pushed the concurrencyFixUpgrade7To8 branch from 0093e06 to c0344ee Compare January 30, 2025 01:41

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jan 30, 2025

codope reviewed Jan 30, 2025

View reviewed changes

fix build failure

bf6592d

codope approved these changes Jan 30, 2025

View reviewed changes

codope mentioned this pull request Jan 30, 2025

[DO NOT MERGE] Investigate COW failure for null lock provider #12739

Draft

4 tasks

codope merged commit b44e19c into apache:master Jan 30, 2025
43 checks passed

This was referenced Jan 30, 2025

[MINOR] Fix typo and remove hardcoded classnames #12740

Merged

[HUDI-8930] Set transaction manager lock requirement based on config #12733

Closed

yihua changed the title ~~[HUDI-8930] Fixing concurrency handling during upgrade~~ [HUDI-8939] Fixing concurrency handling during upgrade Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-8939] Fixing concurrency handling during upgrade #12737

[HUDI-8939] Fixing concurrency handling during upgrade #12737

nsivabalan commented Jan 30, 2025 •

edited

Loading

codope left a comment

codope left a comment

hudi-bot commented Jan 30, 2025

[HUDI-8939] Fixing concurrency handling during upgrade #12737

[HUDI-8939] Fixing concurrency handling during upgrade #12737

Conversation

nsivabalan commented Jan 30, 2025 • edited Loading

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

codope left a comment

Choose a reason for hiding this comment

codope left a comment

Choose a reason for hiding this comment

hudi-bot commented Jan 30, 2025

CI report:

nsivabalan commented Jan 30, 2025 •

edited

Loading