Bug: Increased latency when loosing an ingester since 2.15.0 #10764

raspbeguy · 2025-02-28T09:30:04Z

What is the bug?

Since we upgraded to 2.15.0, loosing an ingester causes a lot of latency from the distributors. Removing the ingester from the ring brings latency back to normal.

How to reproduce it?

On a distributed Mimir cluster with:
- multiple instances of ingesters
- distributors with a consequent traffic
- ingester.ring.zone_awareness_enable=true
- ingester.ring.replication_factor=3
- ingester.ring.unregister_on_shudown=false

shut down an ingester

What did you think would happen?

Shutting down an ingester shouldn't make latency go up (like that was the case on previous releases)

What was your environment?

Debian VMs with Mimir 2.15.0 deployed as APT package. 6 distributors, 12 ingesters, spread over 3 geographical zones.

Any additional context to share?

We noticed no config difference between a cluster in 2.15 version and a cluster in 2.14.3 version on ingester, distributor or ingester_client section.
When the problem is occuring, trace shows that a distributor requests takes a lot more time than usual.

Some metrics when shutting down an ingester

Trace of a distributor request to ingesters when latency is healthy

Trace of a distributor request to ingesters when latency is degraded

raspbeguy added the bug Something isn't working label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Increased latency when loosing an ingester since 2.15.0 #10764

Bug: Increased latency when loosing an ingester since 2.15.0 #10764

raspbeguy commented Feb 28, 2025

Bug: Increased latency when loosing an ingester since 2.15.0 #10764

Bug: Increased latency when loosing an ingester since 2.15.0 #10764

Comments

raspbeguy commented Feb 28, 2025

What is the bug?

How to reproduce it?

What did you think would happen?

What was your environment?

Any additional context to share?