-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to create an index with ingest v2 returns 429 #5719
Conversation
2a27562
to
c8a00f0
Compare
ebb5518
to
e98ccc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we add the capability to write errors back to the barrier so that control plane errors can be shared back with all ingest routing requests that are waiting for shards.
let last_failure = match open_shard_error { | ||
ControlPlaneError::Internal(_) => SubworkbenchFailure::Internal, | ||
ControlPlaneError::Timeout(_) => SubworkbenchFailure::ControlPlaneUnavailable, | ||
ControlPlaneError::TooManyRequests => SubworkbenchFailure::ControlPlaneUnavailable, | ||
ControlPlaneError::Unavailable(_) => SubworkbenchFailure::ControlPlaneUnavailable, | ||
ControlPlaneError::Metastore(metastore_error) => match metastore_error { | ||
MetastoreError::Timeout(_) => SubworkbenchFailure::ControlPlaneUnavailable, | ||
MetastoreError::TooManyRequests => SubworkbenchFailure::ControlPlaneUnavailable, | ||
MetastoreError::Unavailable(_) => SubworkbenchFailure::ControlPlaneUnavailable, | ||
// TODO: are there other metastore errors that can be considered temporary? | ||
_ => SubworkbenchFailure::Internal, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping determines which requests are going to be 500 or 503. In either case they will be retried internally (is_pending()
returns true
for both)
@@ -68,6 +68,7 @@ enum IngestFailureReason { | |||
INGEST_FAILURE_REASON_ROUTER_LOAD_SHEDDING = 8; | |||
INGEST_FAILURE_REASON_LOAD_SHEDDING = 9; | |||
INGEST_FAILURE_REASON_CIRCUIT_BREAKER = 10; | |||
INGEST_FAILURE_REASON_UNAVAILABLE = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until all servers are upgraded, this will be converted to IngestServiceError::Internal
for subrequest in pending_subrequests(&workbench.subworkbenches) { | ||
for subrequest in | ||
pending_subrequests_for_attempt(&workbench.subworkbenches, workbench.num_attempts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some logic to not process further during this retry attempt subrequests for which we already observed an error when trying to create the shards. Otherwise all control plane errors are overriden as "no shard available".
Description of the issue
When using index templates, specifying and index name that matches the pattern but has illegal characters results in a 429 response code instead of a 400.
Description of the problem:
GetOrCreateOpenShards
with an invalid index_id on the metastore fails with MetastoreError::JsonDeserializeErrorProposed solution
This PR solves the problem in two places:
quickwit-serve
, before calling the ingest router (ES bulk and native APIs)How was this PR tested?
Added unit and integration (python) tests