-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Samples loss in ingester after rebalance/exit read-only mode #10582
Comments
My current understanding of the prometheus bug:
Proposals for fixing:
Alternatively, we could find a way to avoid the GC by introducing a way of forcibly preventing GC's for a while when rebalancing. This does not fix the core TSDB bug, it only sidesteps the problem under the condition that we detected it in. Downsides of this are we don't know if this bug is reachable through other conditions, so doing this might leave other vectors for this problem lying around - since it's less ideal I did less research down this path for now.
|
I've written a test: prometheus/prometheus@6ea3c2d that seems to reproduce the TSDB bug. |
Uh, this is an interesting one. I think this part of your test isn't right:
There's one thing missing in your dissection, the statement: s.pendingCommit = true This is the statement that should prevent it from being gc-ed: if len(series.mmappedChunks) > 0 || series.headChunks != nil || series.pendingCommit ||
(series.ooo != nil && (len(series.ooo.oooMmappedChunks) > 0 || series.ooo.oooHeadChunk != nil)) {
seriesMint := series.minTime()
if seriesMint < actualMint {
actualMint = seriesMint
}
return
} But still, good catch 👏 , I understand that the race is happening here: s := a.head.series.getByID(chunks.HeadSeriesRef(ref))
if s == nil {
var err error
s, _, err = a.getOrCreate(lset)
if err != nil {
return 0, err
}
}
s.Lock() As you mentioned, we unlock the stripe and then we lock the series. I think this could be solved by adding an extra arg to func (s *stripeSeries) getByID(id chunks.HeadSeriesRef, locked bool) *memSeries { WDYT? |
@alexweav do you want to take care of this or would you like me to send a fix for the bug? |
@colega I appreciate the second pair of eyes! I will try to re-work the test taking that into account. This issue was a bit on the backburner last week, but I still am interested in submitting the fix; if anything, to gain more familiarity with this component. I should be able to balance some more time to this issue this week. |
I'll let you send the fix then. I wouldn't worry about the test so much though. It's really hard to test this especially in a way that wouldn't keep the test outdated. I think the important thing is that you found it and you can explain it in words. Concurrency is hard. |
During ingester scale-up, two things are likely to happen:
The series churn on a TSDB instance that's been existing for a while, coupled with leaving read-only mode, is extremely likely to trigger multiple head GCs + compactions in a row (logs)
This seems to trigger a known race condition in TSDB between writes and the head GC: prometheus/prometheus#8365
This manifests as samples going missing in an ingester during scale-up. The trace below, for a request affected by this bug, showed a full success for an expected 9 samples:
But when the ingester is directly queried (verifiable with the grpcurl-query-ingesters tool) only 8 samples were present for the expected time range.
If this happens in a situation where we only barely have quorum, it can result in sample loss in queries. (ex. RF=3, but available replicas=2 due to restarts or readonly mode. If the bug occurs on one of the 2 remaining ingesters, a sample will only be present on 1 of 3 ingesters. The roughly 1/3 of queries that don't manage to hit this ingester will not include that sample which was accepted by the database.)
The text was updated successfully, but these errors were encountered: