KAFKA-17510: Exception handling and purgatory completion on initialization delay #17709

apoorvmittal10 · 2024-11-06T23:55:40Z

Continued SharePartition level exception handling from PR: KAFKA-17002: Integrated partition leader epoch for Persister APIs (KIP-932) #16842 and added handling for delayed share partition initialization.
Renamed class ShareFetchData to ShareFetch.
Removed access to share fetch future, added methods to complete future at a single place.
Moved erroneous partitions at a central place so can handle partition level failures in request.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ation delay

junrao

@apoorvmittal10 : Thanks for the PR. Left a few comments.

junrao · 2024-11-07T23:03:06Z

core/src/main/java/kafka/server/share/SharePartitionManager.java

-                    //  for respective share partition as completing the full request might result in
-                    //  some acquired records to not being sent: https://issues.apache.org/jira/browse/KAFKA-17510
-                    maybeCompleteInitializationWithException(sharePartitionKey, shareFetchData.future(), throwable);
+                    handleInitializationException(sharePartitionKey, shareFetch, throwable);


Since we are triggering delayedShareFetch below, do we need to handle the error for shareFetch here?

I think we should do that. If the code reaches here that means SharePartition is not yet initialized or in some error state, which means no fetch lock will be acquired in delay share fetch on respective SharePartition hence no further handling in DelayedShareFetch. However this code will handle that error appropriately.

junrao · 2024-11-07T23:14:29Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

@@ -198,7 +198,7 @@ Map<TopicIdPartition, FetchRequest.PartitionData> acquirablePartitions() {
        Map<TopicIdPartition, FetchRequest.PartitionData> topicPartitionData = new LinkedHashMap<>();

        sharePartitions.forEach((topicIdPartition, sharePartition) -> {
-            int partitionMaxBytes = shareFetchData.partitionMaxBytes().getOrDefault(topicIdPartition, 0);
+            int partitionMaxBytes = shareFetch.partitionMaxBytes().getOrDefault(topicIdPartition, 0);


Should we skip erroneous partitions in shareFetch? Also, when calling sharePartition.maybeAcquireFetchLock(), if the partition is in ERROR or FENCED state, should we add the partition to erroneous partitions in shareFetch too?

Should we skip erroneous partitions in shareFetch?

Here the method is iterating over sharePartitions which have been filled in SharePartitionManager hence some of the errored share partitions might not come here itself as already added to erroneous. You are right, here again there could be a sharePartition which errored out so rather skipping them from fetch we should do the second recommendation of yours which is to add them to errorneous if received exception from maybeAcquireFetchLock.

There already exists a jira for same, from previous PR comments, which I am planning to do next: https://issues.apache.org/jira/browse/KAFKA-17901

If we know there is a partition with an error, we can skip the readFromLog call, which can be a bit expensive.

Make sense, I added a filter check prior replica manager read which tries to filter if any erroneous topic partition is present.

junrao · 2024-11-07T23:15:55Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

    }

    private LogOffsetMetadata endOffsetMetadataForTopicPartition(TopicIdPartition topicIdPartition) {
        Partition partition = replicaManager.getPartitionOrException(topicIdPartition.topicPartition());
        LogOffsetSnapshot offsetSnapshot = partition.fetchOffsetSnapshot(Optional.empty(), true);
        // The FetchIsolation type that we use for share fetch is FetchIsolation.HIGH_WATERMARK. In the future, we can
        // extend it to support other FetchIsolation types.
-        FetchIsolation isolationType = shareFetchData.fetchParams().isolation;
+        FetchIsolation isolationType = shareFetch.fetchParams().isolation;


replicaManager.getPartitionOrException above throws an exception. Should we handle that and add it to shareFetch.erroneous?

junrao · 2024-11-07T23:26:08Z

core/src/main/java/kafka/server/share/SharePartitionManager.java

-            // Do not process the fetch request for this partition as the leader is not initialized yet.
-            // The fetch request will be retried in the next poll.
-            // TODO: Add the request to delayed fetch purgatory.
+            // Skip any handling for this error as the share partition is still loading. The request


When do we get a LeaderNotAvailableException? My understanding is that the throwable is based on the error code from ReadShareGroupStateResponse and it doesn't seem to return LeaderNotAvailableException.

We can only get LeaderNotAvailableException when SharePartition is in INITIALIZING state. Which means that requests should be re-triggered once initialization completes. This exception is only to know if SharePartition is still initializing and never returned to client.

junrao · 2024-11-07T23:37:31Z

share/src/main/java/org/apache/kafka/server/share/fetch/ShareFetch.java

+    }
+
+    /**
+     * May be complete the share fetch request with the given exception for the topicIdPartitions.


May be => Maybe

junrao · 2024-11-07T23:53:48Z

core/src/main/java/kafka/server/share/SharePartitionManager.java

     * @param throwable The exception that occurred while fetching messages.
     */
    public void handleFetchException(
-        String groupId,
+        ShareFetch shareFetch,


Could we move this method to DelayedShareFetch and make it private since it's only called there?

junrao · 2024-11-07T23:57:31Z

core/src/test/scala/unit/kafka/server/ShareFetchAcknowledgeRequestTest.scala

+          }
+        })
+      }
+      responses.size == 3


Should we reset responses during retry? Ditto below.

We shouldn't as the problem we are trying to solve here is that when we enable DefaultStatePersister then we do see a delay in SharePartition getting initialized, which is supposed to happen. And with multi topic-partition share fetch call, say tp0 and tp1, there can be scenario where tp0 is initialized and triggers purgatory's checkAndComplete. Hence share fetch will respond with acquired records of tp0 only.

I have added the retires here where the test case is considered successful when all topic-partitions, tp0 and tp1 in this case, respond with acquired records.

Prior adding topic-partitions in response array I check if the share fetch response does have acquired records or not.

junrao

@apoorvmittal10 : Thanks for the updated PR. A few more comments.

junrao · 2024-11-08T18:08:56Z

share/src/main/java/org/apache/kafka/server/share/fetch/ShareFetch.java

+    /**
+     * The partitions that had an error during the fetch.
+     */
+    private volatile Map<TopicIdPartition, Throwable> erroneous;


volatile guarantees that a subsequent reader will pick up the latest reference to the map, but not necessarily the latest content in the map.

I have moved errorneous in synchronized, thanks for pointing out.

junrao · 2024-11-08T18:29:36Z

core/src/test/scala/unit/kafka/server/ShareFetchAcknowledgeRequestTest.scala

+      assertEquals(Errors.NONE.code, shareFetchResponseData.errorCode)
+      val partitionsCount = shareFetchResponseData.responses().get(0).partitions().size()
+      if (partitionsCount > 0) {
+        assertEquals(1, shareFetchResponseData.responses().size())


How do we guarantee that only 1 partition is included in the response?

This is actually the number of topic in response which could only be 1 as we are fetching for single topic but multiple partitions. However I realized the assertion of topic size should prior to get. I fixed that.

junrao · 2024-11-08T18:31:54Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

@@ -198,7 +198,7 @@ Map<TopicIdPartition, FetchRequest.PartitionData> acquirablePartitions() {
        Map<TopicIdPartition, FetchRequest.PartitionData> topicPartitionData = new LinkedHashMap<>();

        sharePartitions.forEach((topicIdPartition, sharePartition) -> {
-            int partitionMaxBytes = shareFetchData.partitionMaxBytes().getOrDefault(topicIdPartition, 0);
+            int partitionMaxBytes = shareFetch.partitionMaxBytes().getOrDefault(topicIdPartition, 0);


If we know there is a partition with an error, we can skip the readFromLog call, which can be a bit expensive.

apoorvmittal10 · 2024-11-08T22:29:14Z

@junrao Thanks for review. I have addressed the comments.

apoorvmittal10 · 2024-11-12T18:05:39Z

@junrao Please if you can re-review. The build passes on Java 11 and can see unrelated tests failure for Java 23.

junrao

@apoorvmittal10 : Thanks for the updated PR. A few more comments.

junrao · 2024-11-12T19:05:15Z

share/src/main/java/org/apache/kafka/server/share/fetch/ShareFetch.java

+     * @param topicIdPartitions The topic id partitions to filter.
+     * @return The topic id partitions without the erroneous partitions.
+     */
+    public synchronized Set<TopicIdPartition> filterErroneousTopicPartitions(Set<TopicIdPartition> topicIdPartitions) {


The input ordering is important. Could we make that explicit?

Yes it make sense but the problem is even LinkedHashMap's keySet method returns Set interface with an internal implementation of LinkedKeySet. Hence I kept the method definition as Set because of the constraint.

junrao · 2024-11-12T19:09:53Z

share/src/main/java/org/apache/kafka/server/share/fetch/ShareFetch.java

+     * Check if all the partitions in the request have errored.
+     * @return true if all the partitions in the request have errored, false otherwise.
+     */
+    public synchronized boolean isErrored() {


isErrored => errorInAllPartitions ?

junrao · 2024-11-12T19:11:08Z

core/src/main/java/kafka/server/share/DelayedShareFetch.java

@@ -315,11 +325,17 @@ else if (isolationType == FetchIsolation.HIGH_WATERMARK)
    }

    private Map<TopicIdPartition, LogReadResult> readFromLog(Map<TopicIdPartition, FetchRequest.PartitionData> topicPartitionData) {
+        // Filter if there already exists any erroneous topic partition.
+        Set<TopicIdPartition> partitionsToFetch = shareFetch.filterErroneousTopicPartitions(topicPartitionData.keySet());


It's probably better to do this in acquirablePartitions() since it avoids acquiring the locks for error partitions.

The getPartitionorException API is called after acquiring share partitions hence if some share partition has errored out after acquiring then the readFromLog will trigger for that topicPartition. Hence it's best to filter before the expecsive readFromLog API call.
Also if the share partition is errored which is not recoverable then in acquirablePartitions the fetch lock API should return false.

KAFKA-17510: Exception handling and purgatory completion on initializ…

d47be3b

…ation delay

apoorvmittal10 requested a review from junrao November 6, 2024 23:55

github-actions bot added core Kafka Broker KIP-932 Queues for Kafka labels Nov 6, 2024

apoorvmittal10 added 2 commits November 7, 2024 16:46

Merge remote-tracking branch 'upstream/trunk' into KAFKA-17510

4da3aaf

Fixing test

624532f

junrao reviewed Nov 8, 2024

View reviewed changes

apoorvmittal10 added 2 commits November 8, 2024 09:46

Merge remote-tracking branch 'upstream/trunk' into KAFKA-17510

b5bdf40

Correcting code comment

5125b6b

apoorvmittal10 requested a review from junrao November 8, 2024 12:07

Addressing review comments

34b7f28

junrao reviewed Nov 8, 2024

View reviewed changes

apoorvmittal10 added 2 commits November 8, 2024 22:24

Additional review comments

4de8eed

Merge remote-tracking branch 'upstream/trunk' into KAFKA-17510

fe9e4e7

apoorvmittal10 requested a review from junrao November 8, 2024 22:28

Merge remote-tracking branch 'upstream/trunk' into KAFKA-17510

be212b3

junrao reviewed Nov 12, 2024

View reviewed changes

apoorvmittal10 added 2 commits November 12, 2024 23:18

Addressing review comments

f2b2da0

Merge remote-tracking branch 'upstream/trunk' into KAFKA-17510

c1c4d60

apoorvmittal10 requested a review from junrao November 12, 2024 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-17510: Exception handling and purgatory completion on initialization delay #17709

KAFKA-17510: Exception handling and purgatory completion on initialization delay #17709

apoorvmittal10 commented Nov 6, 2024 •

edited

Loading

junrao left a comment

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 8, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024 •

edited

Loading

junrao Nov 7, 2024

apoorvmittal10 Nov 8, 2024

junrao left a comment

junrao Nov 8, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 8, 2024

apoorvmittal10 Nov 8, 2024

junrao Nov 8, 2024

apoorvmittal10 commented Nov 8, 2024

apoorvmittal10 commented Nov 12, 2024

junrao left a comment

junrao Nov 12, 2024

apoorvmittal10 Nov 12, 2024

junrao Nov 12, 2024

apoorvmittal10 Nov 12, 2024

junrao Nov 12, 2024

apoorvmittal10 Nov 12, 2024

KAFKA-17510: Exception handling and purgatory completion on initialization delay #17709

Are you sure you want to change the base?

KAFKA-17510: Exception handling and purgatory completion on initialization delay #17709

Conversation

apoorvmittal10 commented Nov 6, 2024 • edited Loading

Committer Checklist (excluded from commit message)

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apoorvmittal10 Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apoorvmittal10 commented Nov 8, 2024

apoorvmittal10 commented Nov 12, 2024

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apoorvmittal10 commented Nov 6, 2024 •

edited

Loading

apoorvmittal10 Nov 8, 2024 •

edited

Loading