KAFKA-16505: Fix lost source raw key and value in store caches and buffers #18739

loicgreffier · 2025-01-29T11:38:56Z

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

cadonna

Thanks @loicgreffier for the PR!

You did not consider buffers in this PR as I described in my comment. Could you come up with a test that confirms that we also have the same issue with buffers and then provide a fix?

cadonna · 2025-01-30T08:58:35Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java

@@ -259,6 +259,10 @@ public <K, V> void send(final String topic,

        final ProducerRecord<byte[], byte[]> serializedRecord = new ProducerRecord<>(topic, partition, timestamp, keyBytes, valBytes, headers);

+        // As many records could be in-flight,
+        // freeing raw records in the context to reduce memory pressure
+        freeContext(context);


The name of this method is a bit misleading. It basically frees the raw record within the context, not the whole context. What about calling it freeRawInputRecordFromContext()?

cadonna · 2025-01-30T08:59:05Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/StampedRecord.java

+
+    @Override
+    public boolean equals(final Object other) {
+        return super.equals(other);
+    }
+
+    @Override
+    public int hashCode() {
+        return super.hashCode();
+    }


Why are those needed?

Since we added new rawKey and rawValue attributes, SpotBugs requires to define the equals function (https://spotbugs.readthedocs.io/en/stable/bugDescriptions.html#eq-class-doesn-t-override-equals-in-superclass-eq-doesnt-override-equals)

cadonna · 2025-01-30T10:02:18Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorRecordContext.java

+        this.sourceRawKey = null;
+        this.sourceRawValue = null;


You also need to add these info to the serialize() and deserialize() so that the buffer values also get the source record. Here it gets a bit tricky, because you need to consider the case where a serialized record context does not contain the source record because it was written by a version of Streams that has not yet had the source record in the context.

Indeed having "optional" raw key and value make the deserialization tricky.

Let's say we serialize the ProcessorRecordContext in this order timestamp, offset, topic, partition, headers, rawKey, rawValue. After deserializing the headers, the next bytes can be rawKey and rawValue or can be something else (e.g., priorValue

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/BufferValue.java

Line 73 in d7a5b87

final byte[] priorValue = getNullableSizePrefixedArray(buffer);

)

Right now I'm considering serializing the rawKey and rawValue at the very end of the ByteBuffer (i.e., right after here:

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/BufferValue.java

Line 119 in d7a5b87

addValue(buffer, newValue);

). Thus, after deserializing all the non-optional fields if there is some bytes remaining in the buffer, it should be the rawKey and rawValue.

loicgreffier · 2025-01-31T12:40:48Z

@cadonna Changes about buffers will be added to this PR. However, despite my tests using suppress(), I did not manage to lose the rawKey and rawValue for now. I'm always receiving a value for these fields in the processingExceptionHandler.

cadonna · 2025-01-31T13:38:36Z

@cadonna Changes about buffers will be added to this PR. However, despite my tests using suppress(), I did not manage to lose the rawKey and rawValue for now. I'm always receiving a value for these fields in the processingExceptionHandler.

I believe, your test needs to flush the buffer so that the records are written to the changelog topic and then restore the buffer from the changelog topic by stopping and re-starting the app.

Useful code:

The record is written to the set of dirty keys:

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryTimeOrderedKeyValueChangeBuffer.java

Line 489 in 291523e

dirtyKeys.add(serializedKey);
The set of dirty keys is flushed to the changelog:

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryTimeOrderedKeyValueChangeBuffer.java

Line 247 in 291523e

public void flush() {
The buffer is restored from the changelog:

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryTimeOrderedKeyValueChangeBuffer.java

Line 300 in 291523e

private void restoreBatch(final Collection<ConsumerRecord<byte[], byte[]>> batch) {

loicgreffier · 2025-02-02T14:41:41Z

@cadonna Thank you for the guidance. I could trigger the InMemoryTimeOrderedKeyValueChangeBuffer#flush and InMemoryTimeOrderedKeyValueChangeBuffer#restoreBatch and confirm that the rawKey and rawValue are lost.

Please ignore my previous comment #18739 (comment) that brings too many changes.

I've updated the serialization and deserialization. To take into consideration the optional rawKey and rawValue, I've added a char "marker" (i.e., k for rawKey, v for rawValue) to identify the presence of these values without mixing with any other possible bytes.

Let me know your thoughts about this approach

cadonna · 2025-02-05T15:39:26Z

@loicgreffier I discussed the need of writing raw key and raw value of the input record to the changelog topic for buffers with @mjsax and we had some concerns. Writing the input record to the changelog topic might significantly increase the storage requirements because we would need to write two records for each record in the buffer, the record itself and the corresponding input record.

loicgreffier · 2025-02-05T22:08:48Z

@cadonna I understand the concern. Should we restart the thread and discuss the possible alternatives?

Make the sourceRawKey/sourceRawValue a conscious trade against memory with a new configuration?
Drop the sourceRawKey/sourceRawValue and use the current record instead?

cadonna · 2025-02-24T13:14:37Z

@loicgreffier Sorry for the late reply!
Coincidentally, I talked to a user of Kafka Streams that implemented a dead letter queue. They have the requirement to write the raw input record to the dead letter queue. That means, we have two data point from users (yours and this user) that agree on the need of the input record in the dead letter queue.

The issue with the raw input record described above only applies to Streams applications that use suppress or stream-table joins with versioned tables and grace periods. IMO, we should not exclude the raw input record from the error handlers because of some applications. One option can be that writing the raw input record to the changelog topic is enabled by a configuration as you propose above. That means that error handlers need to gracefully handle the situation where the raw input record is not available.

WDYT?

loicgreffier · 2025-02-28T09:27:29Z

@cadonna Agree to introduce a new parameter.

I'm wondering if the concern is the same for store caches. Should the parameter applies to store caches as well (CachingKeyValueStore, CachingSessionStore, CachingWindowStore) ? Or should we consider that there is no impact at all?

cadonna · 2025-03-04T13:01:04Z

@loicgreffier

The first question we need to answer is the following:
Should the raw input record be (a) the record from which the defective record that triggered the handler was derived from or should the raw input record be (b) the record that triggered the defective record to be sent downstreams?

If we decide for (a), we need to define the input record for the DSL operation. For example, what is the input record that should be written to the dead letter queue in case an aggregation result triggers the error handler? Intuitively, I would say it is the last record that contributed to the aggregation. However, there are cases where this might not be true. For example, if the aggregate is a list of accumulated records and the last record is not responsible for the error downstreams. With (a), ideally, we need to carry the raw input record everywhere, into state stores, changelogs, caches, and buffers.
I checked the code. I believe, we have an issue also with caches since I realized that when an entry is added to the cache during a get operation, the record context is populated with sentinel values like -1 and null:

kafka/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java

Line 372 in f13a22a

internalContext.cache().put(cacheName, key, new LRUCacheEntry(rawValue));

.

If we decide for (b), maintaining the raw input record might be a bit simpler but I am not sure how useful the raw input record is in this case. You cannot really understand why something went wrong and replay.

Let me know what you think.

loicgreffier · 2025-03-17T12:15:30Z

@cadonna KIP-1034 suggests that the input record should be the one that triggers the sub-topology: https://cwiki.apache.org/confluence/display/KAFKA/KIP-1034%3A+Dead+letter+queue+in+Kafka+Streams#KIP1034:DeadletterqueueinKafkaStreams-DefaultDeadletterqueuerecord

(a) the record from which the defective record that triggered the handler was derived from

(a) aligns with the definition in KIP-1034, doesn't it? It makes it easier to reprocess records.

(b) the record that triggered the defective record to be sent downstreams?

Does this mean that in the following example:

.stream()
.selectKey() 
.mapValues() <----------- Exception here
...

The record sent to the DLQ would be the input record of the mapValues processor? (i.e., the one passed to the processing error handler?

Intuitively, I would say it is the last record that contributed to the aggregation.

.stream(INPUT_TOPIC)
.selectKey()
.groupByKey()
.aggregate(
  initializer,
  aggregator, <----------- Exception here
  materialized
)
...

With the current implementation:

The source record is the last record that contributed to the aggregation.
The source topic is the repartition topic.
The default DLQ topic, specified by errors.dead.letter.queue.topic.name, may contain records from different sub-topologies (e.g., records from INPUT_TOPIC or the repartition topic).

I believe, we have an issue also with caches since I realized that when an entry is added to the cache during a get operation

I will check and reproduce this case

@sebastienviale Feel free to add anything if I missed a point

sebastienviale · 2025-03-17T14:49:56Z

As @loicgreffier said, it is mentioned in the KIP that the raw record value: "If available, contains the value of the input message that triggered the sub-topology, null if triggered by punctuate"

Which seems to correspond to solution (a). It also seems logical to store in the raw record the last record before the exception occurs.

github-actions bot added triage PRs from the community streams and removed triage PRs from the community labels Jan 29, 2025

loicgreffier mentioned this pull request Jan 29, 2025

KAFKA-16505: Add source raw bytes in processorContex #17960

Open

3 tasks

cadonna reviewed Jan 30, 2025

View reviewed changes

github-actions bot added the clients label Feb 1, 2025

loicgreffier force-pushed the KAFKA-16505-RawKeyValue-Store-Cache branch from f14bd2e to 79f6395 Compare February 1, 2025 23:01

loicgreffier changed the title ~~KAFKA-16505: Fix source raw key and value in store caches~~ KAFKA-16505: Fix lost source raw key and value in store caches and buffers Feb 1, 2025

loicgreffier force-pushed the KAFKA-16505-RawKeyValue-Store-Cache branch from 53bd6b9 to 425e638 Compare February 2, 2025 14:47

KAFKA-16505: Fix source raw key and value in store caches

93cdf62

loicgreffier force-pushed the KAFKA-16505-RawKeyValue-Store-Cache branch from 630b392 to 93cdf62 Compare February 2, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-16505: Fix lost source raw key and value in store caches and buffers #18739

KAFKA-16505: Fix lost source raw key and value in store caches and buffers #18739

loicgreffier commented Jan 29, 2025

cadonna left a comment

cadonna Jan 30, 2025

loicgreffier Jan 31, 2025

cadonna Jan 30, 2025

loicgreffier Jan 31, 2025

cadonna Jan 30, 2025

loicgreffier Jan 31, 2025

loicgreffier commented Jan 31, 2025

cadonna commented Jan 31, 2025

loicgreffier commented Feb 2, 2025

cadonna commented Feb 5, 2025 •

edited

Loading

loicgreffier commented Feb 5, 2025 •

edited

Loading

cadonna commented Feb 24, 2025

loicgreffier commented Feb 28, 2025

cadonna commented Mar 4, 2025

loicgreffier commented Mar 17, 2025

sebastienviale commented Mar 17, 2025

KAFKA-16505: Fix lost source raw key and value in store caches and buffers #18739

Are you sure you want to change the base?

KAFKA-16505: Fix lost source raw key and value in store caches and buffers #18739

Conversation

loicgreffier commented Jan 29, 2025

Committer Checklist (excluded from commit message)

cadonna left a comment

Choose a reason for hiding this comment

cadonna Jan 30, 2025

Choose a reason for hiding this comment

loicgreffier Jan 31, 2025

Choose a reason for hiding this comment

cadonna Jan 30, 2025

Choose a reason for hiding this comment

loicgreffier Jan 31, 2025

Choose a reason for hiding this comment

cadonna Jan 30, 2025

Choose a reason for hiding this comment

loicgreffier Jan 31, 2025

Choose a reason for hiding this comment

loicgreffier commented Jan 31, 2025

cadonna commented Jan 31, 2025

loicgreffier commented Feb 2, 2025

cadonna commented Feb 5, 2025 • edited Loading

loicgreffier commented Feb 5, 2025 • edited Loading

cadonna commented Feb 24, 2025

loicgreffier commented Feb 28, 2025

cadonna commented Mar 4, 2025

loicgreffier commented Mar 17, 2025

sebastienviale commented Mar 17, 2025

cadonna commented Feb 5, 2025 •

edited

Loading

loicgreffier commented Feb 5, 2025 •

edited

Loading