You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When you configure a large out of order window, even a single sample in a 2hr window will lead to a block, which then compactor has to discover, compact, and cleanup. It becomes a problem when there are 1000s of such blocks, making the block cleanup and bucket index update take forever. While there are ways to make the block cleanup and bucket index update faster, why not fix something at the - where we create the blocks.
Describe the solution you'd like
The solution I am proposing is for this part of the code.
Instead of creating blocks of 2h range always for out-of-order data, if the data is for the days before the current day, compact them into 24h blocks, and only compact 2h blocks for the current day.
This way, in the best case, there will be 12x reduction in out-of-order blocks if there is large out of order window spanning multiple days and we get unlucky with OOO samples in all of the 2h windows.
Even if OOO samples span two 2h windows in the previous days, there will still be 2x reduction in blocks for that range.
In the worst case, there is no change, i.e. just 1 block.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When you configure a large out of order window, even a single sample in a 2hr window will lead to a block, which then compactor has to discover, compact, and cleanup. It becomes a problem when there are 1000s of such blocks, making the block cleanup and bucket index update take forever. While there are ways to make the block cleanup and bucket index update faster, why not fix something at the - where we create the blocks.
Describe the solution you'd like
The solution I am proposing is for this part of the code.
Instead of creating blocks of 2h range always for out-of-order data, if the data is for the days before the current day, compact them into 24h blocks, and only compact 2h blocks for the current day.
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: