Performance Testing of Fluent-bit with several filters shows log processing falling < 5mb/s #9399
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is more of a Question/Issue, but I created an example test that can be used so created this as a PR.
Background
I have been running into a few performance bottlenecks within my fluent-bit setup in kubernetes so I created a k8s_perf_test test that can be used to (hopefully) show the issue and hopefully this will result in some discussion about performance tuning, fluent-bit defaults, or possibly even pointing out some flaws with my own set up 😄.
Test Setup
examples/k8s_perf_test/run-test.sh
andexamples/k8s_perf_test/values.yaml
are setup to use the standard fluent-bit helm chart.extraContainers
to create a python/ubuntu container (calledlogwriter
) that is sidecar'd with fluent-bitextraFiles
to store my container startup script andtest_runner.py
emptyDir
(perftest-volume) for ephermal shared storage betweenfluent-bit
andlogwriter
, mounted in both at/app/perftest/containers
, the/fluent-bit
configmap is also mounted in both containersrun-log-writer-test.sh
passes configuration totest_runner.py
, specifically it builds a logfile name that "impersonates" a log filename that would be created bycontainerd
. thetest_runner.py
creates the logfile in/app/perftest/containers/
which is watched by the fluent-bittail
input.test_runner.py
has a small bit of logic, but has been performant enough on a macbook pro 2019, and gcp (gke) n2-standard-8 w/ ssd boot disks to write >50Mb/s to a file. It writes in thecontainerd
(cri) format and it also does file renames to mimic logrotate./api/v1/metrics/
output null.0 proc_recordsnull
Results
I ran this on both a gcp n2-standard-8 host that used ssd for it's bootdisk as well as a macbook. The results were similar in both cases in term of fluent-bit throughput. The numbers below are from a macbook pro 2019 2.3Ghz i7 running a single node kind (k8s) on docker.
1. tail input defaults do not seem optimal, setting larger input buffers are more performant, but can then result in downstream issue
tail
input that uses nomultiline.parser
nor any filters using, and the default buffers ingests slower than when higher buffers are defined. However defining higher buffers can lead to output errors like: out_stackdriver: does not batch output records properly if passed a large chunk of records and can drop a majority of records #9374 & Allow output plugins to configure a max chunk size #1938 as it tends to create larger chunks.Initial input config:
1a. A fluent-bit config that only reads and doesn't parse anything isn't super useful, so I re-tested the above config with
multiline.parser cri
and the following settings:I changed the buffer settings as followed for varying results:
This looks like we could add a few mb/s to fluent-bit throughput just by increasing these buffer sizes by default (which is only 32K). However this seems to create oversized chunks and output plugins can not handle that well (#1938). Is there any other suggestion for improving the initial parsing speed?
NOTE: for the setup above i used a
filters: ""
in values.yaml2. Adding common processing filters quickly slows down fluent-bit to a crawl
filters-simple
andfilters-extended
in values.yaml. When testing with those you will need to rename those sections to justfilters
for it to be activated.For these changes I kept the larger buffers, my input section was:
2a. Please review the values.yaml
filters-simple
section.I started by adding just the following:
_p
artifact that comes from cri parsing, this lowered the processing to 18.065Mb/s (down from the 22.78Mb/s w/ higher buffers and no filter)2b. adding filter kubernetes for namespace labels & annotations, and pod labels & annotations. This also used Merge_log to move the
log
filed tomessage
filters-extended
example, which move k8s and other fields around, and potentially removes other fields before being sent to an output2c. please look at
filters-extended
in values.yaml, this has what is infilters-simple
plus then uses a nest/lift to move kubernetes meta files and a modify filter.After using the
filters-extended
config, I ran into several issues with fluent-bit being able to keep up with log-rotation, something I also have seen in my production setups. It potentially misses logrotates and does not realize it (switching Inotify_watcher false) does not seem to be an improvement and it's hard to tell because this is also not reflected within fluent-bit metrics (it doesn't know it missed a rotation so how can it record it). To address it for this test only you can change Rotate_Wait in the the input to an extremely high number like 300. In standard k8s setup's, you will miss data as kubelet generally is doing logrotation when a container log size reaches 10mb (usually at 10s interval checks). So as fluent-bit backs up and a container is writing faster than fluent-bit can process, logs are missed w/ no metrics available to know they've been missed.The input pauses constantly because the engine thread is backed up since all filters are executed single-thread in the engine thread (iirc) and fluent-bit is at a processing rate of 4.9Mb/s. (In my actual prod setup i have another lua script that runs between the last 2 filters, and that loses another 1.5Mb/s of throughput to the point fluent-bit pipeline can only process 3.5Mb/s).
Questions
filters-extended
version)