KAFKA-17109: Move lock backoff retry to streams TaskManager #17209

aliehsaeedii · 2024-09-16T12:44:17Z

This PR aims at resolving the issue made by #17116

mumrah · 2024-09-16T13:45:59Z

@aliehsaeedii please update the PR title to have a description of the patch. Thanks!

cadonna

Thanks for the PR, @aliehsaeedii !

Here my feedback.

I am missing unit tests.

cadonna · 2024-09-16T14:26:33Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+        }
+
+        public boolean canAttempt(final long nowMs) {
+            return  nowMs - lastAttemptMs >= EXPONENTIAL_BACKOFF.backoff(attempts);


nit:

Suggested change

return nowMs - lastAttemptMs >= EXPONENTIAL_BACKOFF.backoff(attempts);

return nowMs - lastAttemptMs >= EXPONENTIAL_BACKOFF.backoff(attempts);

cadonna · 2024-09-16T14:31:22Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+                stateUpdater.add(task);
+                taskIdToBackoffRecord.remove(task.id());


Minor:
I would swap those two lines. Once the task is initialized, the backoff can be removed.

cadonna · 2024-09-16T15:04:25Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+                taskIdToBackoffRecord.remove(task.id());
+            } else {
+                log.trace("Task {} is still not allowed to retry acquiring the state directory lock", task.id());
+                handleUnsuccessfulLockAcquiring(task, nowMs);


Is this correct?
Every time initialization is attempted before the back-off, the time of the last attempt is updated to the current time. If we assume an attempt every poll interval and the poll interval is less than the back-off time, the task will never be initialized.
Assume the last unsuccessful attempt occurred at time 200 and now the current call to canTryLock() is 100ms later at time 300. Furthermore, assume the current back-off is 250. That is, canTryLock() should return false because 300 - 200 >= 250 is not true. The last attempt is updated to 300 and the backoff is exponentially updated with the increased number of attempt (let's say 500). If you try again in 100ms at 400 canTryLock() will again return false, because 400 - 300 >= 500 is still not true and it will also not be true next time. You should only update the back-off record if you actually have attempted to initialize the task and it was unsuccessful and not when you skipped the attempt due to the back-off.

cadonna · 2024-09-16T15:14:30Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+    public static class BackoffRecord {
+        private long attempts;
+        private long lastAttemptMs;
+        private static final ExponentialBackoff EXPONENTIAL_BACKOFF = new ExponentialBackoff(1, 2, 10000, 0.5);


Should the exponential back-off be specified in terms of poll time? Something like

new ExponentialBackoff(pollTime, 2, 10000, 0.5);

If it is to much trouble getting that config into the task manager, just choose something larger than 1ms. 1 ms sounds really small. The sequence of the back-offs would be 1ms, 2ms, 4ms, 8ms, 16ms, 32ms, 64, 128. At the same time, with default configs, the task initialization is attempted every 100ms. So, it seems there will not be much improvement to the current situation because the first 7 poll iterations you attempt to initialize the task.

aliehsaeedii · 2024-09-16T18:30:13Z

Thanks for the PR, @aliehsaeedii !

Here my feedback.

I am missing unit tests.

Thanks @cadonna. Utest is added + review is addressed

cadonna · 2024-09-16T19:00:30Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

@@ -2116,4 +2132,37 @@ boolean needsInitializationOrRestoration() {
    void addTask(final Task task) {
        tasks.addTask(task);
    }
+
+    private boolean canTryLock(final TaskId taskId, final long nowMs) {


Sorry, I forgot to add this comment before in my review. Could you please rename this method to canTryInitializeTask()? I think that makes more sense.

@cadonna makes sense!

cadonna

Thanks for the updates!

Here my comments.

cadonna · 2024-09-18T22:44:29Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

@@ -1006,14 +1014,22 @@ private void addTasksToStateUpdater() {
    }

    private void addTaskToStateUpdater(final Task task) {
+        final long nowMs = System.currentTimeMillis();


Here, you need to use

Suggested change

final long nowMs = System.currentTimeMillis();

final long nowMs = time.milliseconds();

We inject the time object at creation, so that we can control time for example in tests.

cadonna · 2024-09-18T22:46:16Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+    /* For testing */
+    void setTaskIdToBackoffRecord(final Map<TaskId, BackoffRecord> taskIdToBackoffRecord) {
+        this.taskIdToBackoffRecord = taskIdToBackoffRecord;
+    }


I do not think you need this method if you can control time as I describe on line 1017.

cadonna · 2024-09-18T22:46:53Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+                .inState(State.RUNNING).build();
+        final TasksRegistry tasks = mock(TasksRegistry.class);
+        when(tasks.drainPendingTasksToInit()).thenReturn(mkSet(task00, task01));
+        final TaskManager.BackoffRecord backoffRecord = mock(TaskManager.BackoffRecord.class);


You do not need this mock. You can advance time with the time object.

Advancing time alone wont help since the backoff record corresponding to task00 is not existing in the map.

cadonna

Thanks @aliehsaeedii !

The test looks great now! That is how I envisioned the test!

I had some minor formatting comments. Sorry for all the comments, but since this is a rather complicated test, I think it makes sense structure it well to make it better readable.

cadonna · 2024-09-25T07:12:40Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

@@ -1243,6 +1245,50 @@ public void shouldRetryInitializationWhenLockExceptionInStateUpdater() {
        verify(stateUpdater).add(task01);
    }

+    @Test
+    public void shouldRetryInitializationWhenCanNotInitializeTask() {


The test is very good!
I just have some formatting comments.

Could you please rename the test to shouldRetryInitializationWithBackoffWhenInitializationFails?

cadonna · 2024-09-25T07:13:08Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+                .withInputPartitions(taskId00Partitions)
+                .inState(State.RESTORING).build();


We use 4 spaces and not 8 for indentation.

Suggested change

.withInputPartitions(taskId00Partitions)

.inState(State.RESTORING).build();

.withInputPartitions(taskId00Partitions)

.inState(State.RESTORING).build();

cadonna · 2024-09-25T07:13:30Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+                .withInputPartitions(taskId01Partitions)
+                .inState(State.RUNNING).build();


Please fix the indentation also here.

cadonna · 2024-09-25T07:13:50Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+        verify(task00).initializeIfNeeded();
+        verify(task01).initializeIfNeeded();
+        verify(tasks).addPendingTasksToInit(
+                argThat(tasksToInit -> tasksToInit.contains(task00) && !tasksToInit.contains(task01))


Please fix the indentation.

cadonna · 2024-09-25T07:13:54Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+        // initializeIfNeeded() has NOT been called this time
+        verify(task00, Mockito.times(1)).initializeIfNeeded();
+        verify(tasks, Mockito.times(2)).addPendingTasksToInit(
+                argThat(tasksToInit -> tasksToInit.contains(task00))


Please fix the indentation

cadonna · 2024-09-25T07:20:02Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+
+        taskManager.checkStateUpdater(time.milliseconds(), noOpResetter);
+
+        verify(task00).initializeIfNeeded();


Could you add an inline comment here stating:

// task00 should not be initialized due to LockException, task01 should be initialized

cadonna · 2024-09-25T07:23:18Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+
+        taskManager.checkStateUpdater(time.milliseconds(), noOpResetter);
+
+        // initializeIfNeeded() has NOT been called this time


This inline comment is not really clear. Could you please change it to something like:

// task00 should not be initialized since the backoff period has not passed.

cadonna · 2024-09-25T07:25:20Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+        verify(stateUpdater, never()).add(task00);
+        verify(stateUpdater).add(task01);
+
+        taskManager.checkStateUpdater(time.milliseconds(), noOpResetter);


Could you add a time.sleep(5000) before this call, please?
Please add a new line between time.sleep() and checkStateUpdater().

For the 2nd try 5000 does not work but anything less than 1000 is good!

Of course, you are right! My bad!

cadonna · 2024-09-25T07:27:49Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+        verify(stateUpdater, never()).add(task00);
+
+        time.sleep(10000);
+        // do not throw lock exception this time


Could you change this comment to something like:

// task00 should call initialize since the backoff period has passed

cadonna · 2024-09-25T07:28:31Z

streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java

+        );
+        verify(stateUpdater, never()).add(task00);
+
+        time.sleep(10000);


I would add a new line after this line to highlight that time passed between the two initialization attempts.

cadonna

@aliehsaeedii Thanks for the updates!

LGTM!

cadonna · 2024-09-26T07:14:13Z

@aliehsaeedii the following test fails consistently with a NPE with this PR:
StreamThreadTest.shouldRecordCommitLatency().
See https://github.com/apache/kafka/actions/runs/11030675352?pr=17209
Could you have a look?

mumrah · 2024-09-29T14:12:17Z

@aliehsaeedii can you merge in latest trunk to pick up the fix for the failing FeatureCommandTest?

mumrah · 2024-09-30T14:42:22Z

@cadonna I was a little puzzled by the FeatureCommandTest failure, so I dug into this a bit. A few things I learned:

By default, the checkout action will checkout the PRs merge commit rather than it's head commit (similar to what Jenkins did)
Re-running a workflow does not "pull in" new changes from trunk for the merge commit.

The commit which fixed the test failure on trunk was

commit cd4d6ce9d576b170c10f81f3081529885abb933c
Author: PoAn Yang <[email protected]>
Date:   Thu Sep 26 23:24:09 2024 +0800

    MINOR: fix failed cases in FeatureCommandTest (#17287)
    
    Reviewers: David Arthur <[email protected]>

Which was Thu Sep 26 15:24:09 2024 UTC.

The first workflow run for this PR (prior to merging in trunk) was at 2024-09-26T11:45:55Z, so a few hours before the fix was committed. The merge commit was up to bd94a73 at that point (logs). The subsequent re-runs of the workflow did not advance the merge commit since a re-run of a workflow is meant to be deterministic.

cadonna · 2024-10-01T08:03:22Z

@mumrah Thanks for the explanation!

lucasbru · 2024-11-05T15:31:58Z

@cadonna @aliehsaeedii Should we not port back these fixes to 3.9, 3.8 ?

cadonna · 2024-11-06T10:22:04Z

Yeah, that is a good idea! As far as I understand, the change seems to be well tested, right? @aliehsaeedii

aliehsaeedii added 2 commits September 16, 2024 13:31

kafka17109: lock backoff-retry

50aa532

remove runtime exception

cacfa1d

aliehsaeedii mentioned this pull request Sep 16, 2024

KAFKA-17109: implement exponential backoff for state directory lock #17116

Merged

aliehsaeedii changed the title ~~KAFKA-17109~~ KAFKA-17109: Move lock backoff retry to streams TaskManager Sep 16, 2024

cadonna reviewed Sep 16, 2024

View reviewed changes

add utest + address reviews

36af73b

cadonna reviewed Sep 16, 2024

View reviewed changes

aliehsaeedii added 2 commits September 17, 2024 13:58

rename method

6999dee

fix failing test

fa1c872

cadonna reviewed Sep 18, 2024

View reviewed changes

github-actions bot added the streams label Sep 24, 2024

merge with trunk

abc96fc

aliehsaeedii force-pushed the streams-backoff-retry-fix2 branch from 12ebb1e to abc96fc Compare September 24, 2024 14:21

aliehsaeedii added 2 commits September 24, 2024 18:15

address reviews

345e2b5

update unit test

a16bacd

cadonna reviewed Sep 25, 2024

View reviewed changes

address reviews

aa8abfc

cadonna approved these changes Sep 25, 2024

View reviewed changes

fix failing utest

e6e716f

Merge branch 'trunk' into streams-backoff-retry-fix2

e160f98

cadonna merged commit bb11257 into apache:trunk Sep 30, 2024
9 checks passed

	return nowMs - lastAttemptMs >= EXPONENTIAL_BACKOFF.backoff(attempts);
	return nowMs - lastAttemptMs >= EXPONENTIAL_BACKOFF.backoff(attempts);

		stateUpdater.add(task);
		taskIdToBackoffRecord.remove(task.id());

	final long nowMs = System.currentTimeMillis();
	final long nowMs = time.milliseconds();

		.withInputPartitions(taskId00Partitions)
		.inState(State.RESTORING).build();

		.withInputPartitions(taskId01Partitions)
		.inState(State.RUNNING).build();


		taskManager.checkStateUpdater(time.milliseconds(), noOpResetter);

		verify(task00).initializeIfNeeded();


		taskManager.checkStateUpdater(time.milliseconds(), noOpResetter);

		// initializeIfNeeded() has NOT been called this time

KAFKA-17109: Move lock backoff retry to streams TaskManager #17209

KAFKA-17109: Move lock backoff retry to streams TaskManager #17209

Conversation

aliehsaeedii commented Sep 16, 2024

mumrah commented Sep 16, 2024

cadonna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aliehsaeedii commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna left a comment

Choose a reason for hiding this comment

cadonna commented Sep 26, 2024

mumrah commented Sep 29, 2024

mumrah commented Sep 30, 2024

cadonna commented Oct 1, 2024

lucasbru commented Nov 5, 2024

cadonna commented Nov 6, 2024

aliehsaeedii commented Sep 16, 2024 •

edited

Loading