You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar feature requirement.
Description
In DolphinScheduler's scheduling strategy where workflows continue after task failures, we encountered a limitation with the "Recovery Failed" feature. Specifically, if a task within a workflow fails, and other tasks are still running for a period of time, the "Recovery Failed" option becomes unavailable. We can only recover the workflow after the entire workflow fails, leading to delays in completing the failed task and its subsequent tasks.
For example, in the attached scenario (see image):
Task B1 has failed, while other tasks like A1 (which Workflow2 depends on) continue running. If we wait for Workflow1 to fail before recovering the failed task (B1), B1's completion will be delayed. However, if we terminate Workflow1 immediately and then recover it, the dependent workflow (Workflow2) would unnecessarily fail due to A1 being killed, requiring us to recover Workflow2 as well.
Proposed Feature:
We suggest adding a feature that allows us to recover failed tasks within a running workflow. This would provide a way to proactively recover tasks like B1 before the entire workflow fails, giving workflows that would otherwise fail the opportunity to complete successfully.
This enhancement could save time and prevent cascading failures in dependent workflows. It would be particularly useful in scenarios where we can foresee a task's failure leading to the workflow’s eventual failure.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
Search before asking
Description
In DolphinScheduler's scheduling strategy where workflows continue after task failures, we encountered a limitation with the "Recovery Failed" feature. Specifically, if a task within a workflow fails, and other tasks are still running for a period of time, the "Recovery Failed" option becomes unavailable. We can only recover the workflow after the entire workflow fails, leading to delays in completing the failed task and its subsequent tasks.
For example, in the attached scenario (see image):
Task B1 has failed, while other tasks like A1 (which Workflow2 depends on) continue running. If we wait for Workflow1 to fail before recovering the failed task (B1), B1's completion will be delayed. However, if we terminate Workflow1 immediately and then recover it, the dependent workflow (Workflow2) would unnecessarily fail due to A1 being killed, requiring us to recover Workflow2 as well.
Proposed Feature:
We suggest adding a feature that allows us to recover failed tasks within a running workflow. This would provide a way to proactively recover tasks like B1 before the entire workflow fails, giving workflows that would otherwise fail the opportunity to complete successfully.
This enhancement could save time and prevent cascading failures in dependent workflows. It would be particularly useful in scenarios where we can foresee a task's failure leading to the workflow’s eventual failure.
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: