-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swift-kafka-client consumer is very slow (was: Provide bulk messages from KafkaConsumer
)
#132
Comments
KafkaConsumer
I am curious to where this slow down come from. Is it from how we are polling Kafka or just the fact that every |
Hi @FranzBusch! Thank you for your thoughts. My assumption is that there are coincidence of 2 factors (I have to profile and check to prove it though):
However, there is something else here because I think I've tried to make sequence that returns one-by-one messages but inside operates with arrays and it was not as good as providing arrays |
I've made two examples in the fork for single and multiple messages transferred throw async sequence. Note, that I could not see the difference in docker because the overall execution was too slow. So, tests were performed with bare metal and with RedPanda (Redpanda just because it was installed).
The slight difference between 1 and 2 I guess related to rate calculation, while it seems that for the 3rd async sequence overhead make the most impact. Even though we know that the performance problem somewhere in AsyncSequence that might require use better synchronisation primitives or any other improvements in stdlib, it still would be nice to have bulk updates. @FranzBusch could you suggest if you have something in mind on that, please? |
Hi @felixschlegel , @FranzBusch! I think I need to do a step back and elaborate a bit why @blindspotbounty suggests to implement reading in batches. We did some measurements internally with small app which just reads data from Kafka and does nothing. Just to understand maximum reading throughput we can get from Kafka. At the moment swift client is slower (@blindspotbounty will provide exact numbers). @blindspotbounty , maybe it's even better to add benchmark test to the swift-kafka-client library that @felixschlegel and @FranzBusch will have reproducible test at hands. So we are thinking what kind of improvements we can suggest to the swift client to get parity with native librdkafka reading throughput. As one possible solution, as @FranzBusch mention, we can optimise AsyncStream to make it much more efficient for reading / writing, another solutions is, much more extreme and maybe not desired, the swift client can expose native So, to summarise, I would rename the case to |
I do understand the necessity to improve performance and it is something we haven't focused on yet. I agree that the first step would be to setup a benchmarking target similar to what we have done in On the different approaches, I think the Though first things first, let's add a benchmark and then actually see the instrument traces where we are spending our time and allocations. |
@mr-swifter thank you for your input! The bulks is not the target but rather one of possible solution to speedup current implementation to be able to make it comparable with librdkafka. Sorry that I didn't provide it before... But to show the difference, I've made two tests in one executable (available at https://github.com/ordo-one/swift-kafka-client/tree/pure-librdkafka-poll). The branch is based on swift-kafka-client/main. Test creates unique topic (similar to tests) and produce
To reproduce locally, they can be run with the following command:
Results for (1):
Results for (2):
I've also added
Results for (1) in docker:
Results for (2) in docker:
Finally, the difference between running with native librdkafka and swift-kafka-client interface is about 65x. I believe there are two places where swift-kafka-client spend a lot of time:
The above branch is main with slight modification exclusively for this test. Therefore, it is possible to test/profile swift-kafka-client code directly using that test executable. @FranzBusch, @felixschlegel maybe that code will help in benchmarking and probably comparison with librdkafka as baseline. |
KafkaConsumer
KafkaConsumer
)
@blindspotbounty we should set up an embedded benchmark target as suggested - please coordinate with @mr-swifter as I’ll be away a couple of days. |
@hassila, @mr-swifter added a draft PR with both tests. Hope that will be helpful :) |
@blindspotbounty are we happy with the performance now? |
@FranzBusch yeah, after moving to direct access of librdkafka it is not a problem anymore: #158 So, this case we should consider closed! |
Currently, messages in
KafkaConsumer
provide messages one by one.That is convenient, however it is not efficient for reading big topics, especially on service recovery.
I've made a small experiment by changing
consumerMessages
enum to accept array instead of single message and pack all messages from single poll to one event.Also changed
messages
andKafkaConsumerMessages
to provide bulks.Then tested that with simple consumer applications.
For single messages:
With results:
For bulk:
Results:
The latter shows results that are near 1Gbps network limits.
This is interesting as it is mostly done in the library and very natural with current poll implementation (as in some librdkafka examples) but not provided to end user.
From our perspective, that is especially useful when application require recovery from a huge topic(s) and needs to cache data e.g. in database, so it can receive and use bulk data.
The text was updated successfully, but these errors were encountered: