-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Increased Distributor latency in 2.15 #10717
Comments
It's not expected, no. What was typical p99 latency in your environment before? Do you notice increased CPU or memory usage in the distributors or ingesters? Do you have distributed tracing, so you can see which component(s) have increased latency? |
Hey @bboreham, thanks for the reply.
List below shows mean latency on 2.14 for the 2 week period prior to upgrade against last 2 weeks on 2.15. Distributor
Ingester
List below shows mean CPU & Memory usage on 2.14 for the 2 week period prior to upgrade against last 2 weeks on 2.15. Distributor
Ingester
No, we don't have distributed tracing enabled. |
Hey @bboreham , after a bit more digging we found a correlation between the increase latency spike and our instance type. Before the 2.15 upgrade we migrated from Intel (m7i) to Graviton (m7g & m8g) based nodes. We were only on Graviton and 2.14 for a couple of days but during this time we do not see any increase in latency in either the Ingester or Distributor. |
What is the bug?
Hey Grafana team,
We have an alert that fires when we see latency above 1 second from the distributor using the below query:
histogram_quantile(0.99, avg by (le) (rate(cortex_request_duration_seconds_bucket{job=~"(cortex)/((distributor|cortex|mimir|mimir-write))",route=~"/distributor.Distributor/Push|/httpgrpc.*|api_(v1|prom)_push|otlp_v1_metrics"}[5m]))) > 1
Since upgrading to 2.15 this alert has been consistently firing. We have noticed a large bump in overall performance in Mimir, but we're wondering if increased latency is to be expected as a possible side effect ?
Side Note: Thanks for all the great work, our team was delighted with the performance improvements we've seen overall in 2.15 aside from this latency jump.
How to reproduce it?
Mimir 2.15
What did you think would happen?
N/A
What was your environment?
Kuberenetes
Any additional context to share?
No response
The text was updated successfully, but these errors were encountered: