Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: rate-limit message should let you know whether it was rate or burst that was exceeded. #10833

Open
bboreham opened this issue Mar 8, 2025 · 0 comments · May be fixed by #10835
Open

Idea: rate-limit message should let you know whether it was rate or burst that was exceeded. #10833

bboreham opened this issue Mar 8, 2025 · 0 comments · May be fixed by #10835
Assignees
Labels

Comments

@bboreham
Copy link
Contributor

bboreham commented Mar 8, 2025

What is the problem you are trying to solve?

It's not possible to tell whether a rate-limit error is caused by instantaneously exceeding the burst limit, or by exceeding the rate over a period of time. This leads to people claiming the rate limit is not working since they are in general below the limit.

Example message (OTLP):

ts=2025-03-07T16:50:02.639131325Z caller=otel.go:284 level=error user=<redacted> msg="detected an error while ingesting OTLP metrics request (the request may have been partially ingested)" httpCode=429 err="the request has been rejected because the tenant exceeded the ingestion rate limit, set to 10000 items/s with a maximum allowed burst of 25000. This limit is applied on the total number of samples, exemplars and metadata received across all distributors (err-mimir-tenant-max-ingestion-rate). To adjust the related per-tenant limits, configure -distributor.ingestion-rate-limit and -distributor.ingestion-burst-size, or contact your service administrator." insight=true

Example message (remote-write):

ts=2025-03-08T09:37:54.185819695Z caller=grpc_logging.go:76 level=warn method=/httpgrpc.HTTP/Handle duration=8.342773ms msg=gRPC err="rpc error: code = Code(429) desc = the request has been rejected because the tenant exceeded the ingestion rate limit, set to 1500 items/s with a maximum allowed burst of 15000. This limit is applied on the total number of samples, exemplars and metadata received across all distributors (err-mimir-tenant-max-ingestion-rate). To adjust the related per-tenant limits, configure -distributor.ingestion-rate-limit and -distributor.ingestion-burst-size, or contact your service administrator.\n"

Prior to #2104 the message included "while adding %d samples, %d exemplars and %d metadata", which meant it was possible to see whether the burst limit was exceeded by this one request.

Which solution do you envision (roughly)?

If the burst limit was exceeded by this one request, state this fact, also how many items were in the request.

Otherwise we can conclude that the rate limit was exceeded.

Have you considered any alternatives?

Not really.

Any additional context to share?

No response

How long do you think this would take to be developed?

Small (<= 1 month dev)

What are the documentation dependencies?

No response

Proposer?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants