-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update deployment and behavior in ExecutionInfo #6798
base: versioning-3
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to look at the test files still, submitting to save progress / share feedback while I help with a canary issue in the release channel
proto/internal/temporal/server/api/persistence/v1/executions.proto
Outdated
Show resolved
Hide resolved
temporal.api.common.v1.WorkerDeployment deployment = 2; | ||
// Manual override for execution's versioning behavior. Takes precedence over `behavior`. | ||
temporal.api.enums.v1.VersioningBehavior behavior_override = 3; | ||
// Used to manually pin the execution to a deployment. Must be set when if and only if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Used to manually pin the execution to a deployment. Must be set when if and only if | |
// Used to manually pin the execution to a deployment. Must be set if and only if |
proto/internal/temporal/server/api/persistence/v1/executions.proto
Outdated
Show resolved
Hide resolved
// If this activity was attempted to start during an ongoing redirect of the workflow, we set | ||
// this flag so we remember after completion of the activity to reschedule the dropped task. | ||
// History rejects the start of activities when a workflow is redirecting to a different | ||
// deployment. Those rejected starts will cause the task to be dropped by Matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you're talking about the "dropped task" could you specify that it's a dropped Activity Task? I feel like it doesn't hurt to be extra clear.
Also, what do you think about calling the field has_dropped_task
since it's a boolean and not the task itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropped_task
is fine too, I'm just wondering if has_dropped_task
would be more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up deleting this. We have to reschedule all the non-started activities when the redirect successfully completes anyways. There is not much use to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
The only major comment I had is about how to handle conflict resolution when there is an ongoing redirect due to a manual override, and another redirect comes in because the poller's deployment/behavior is different than the override deployment/behavior. In that case, we need to reject the poller-initiated redirect.
Now that I think about it, I think I can also handle this as part of the move PR though, since I think it will hopefully just involve adding an if-statement. I think it will make more sense to handle this case in the same PR that adds the override.
// creates a mutable state with first WFT completed on deployment "my_app:build_1" and behavior set | ||
// to the passed value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// creates a mutable state with first WFT completed on deployment "my_app:build_1" and behavior set | |
// to the passed value. | |
// creates a mutable state with first WFT completed on the given deployment and behavior set | |
// to the given behavior, testing expected output after Add, Start, and Complete Workflow Task. |
func StampForBuildId(buildId string) *commonpb.WorkerVersionStamp { | ||
return &commonpb.WorkerVersionStamp{UseVersioning: true, BuildId: buildId} | ||
} | ||
func StampForDeployment(deployment *commonpb.WorkerDeployment) *commonpb.WorkerVersionStamp { | ||
return &commonpb.WorkerVersionStamp{UseVersioning: true, BuildId: deployment.BuildId, DeploymentName: deployment.DeploymentName} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these be StampFromBuildId
and StampFromDeployment
?
// If the redirect was initiated by this activity we must create a workflow task | ||
// to ensure the workflow won't be stuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this to handle the case where the activity poller / activity TQ initiates the deployment change, and we don't know yet whether the workflow poller / WFTQ is also changing deployments?
- If so, is this a correct understanding of why we create a WFT here: We create a workflow task to send to the current workflow TQ, so that poller can tell us it's updated deployment on wf task completion?
behaviorOverride == enumspb.VERSIONING_BEHAVIOR_UNSPECIFIED { | ||
// WF is pinned and the redirect is not from a manual override, so we reject it. | ||
// It's possible that a backlogged task in matching from an earlier time that this wf was | ||
// unpinned is being dispatched now and wants to redirect the wf. Such task should be dropped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super basic thing I should know: If Matching drops a backlogged task, will History eventually re-send it to Matching, because History never got a "WFT Started Event" for that task? Or, does Matching send some type of ack
to History when it writes a task to the backlog, meaning it's Matching's responsibility to not drop anything in the backlog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the first way but want to confirm
// When present, indicates the workflow is being redirected to a different deployment. | ||
// A redirect can only exist during the lifetime of a pending workflow task. | ||
// If the pending workflow task completes (at the next WorkflowTaskCompleted event), the | ||
// redirect is considered complete and the workflow's deployment is updated. If the pending | ||
// workflow task fails or times out, then the redirect is canceled and workflow remains on | ||
// the previous deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to cancel the redirect if the pending WFT fails or times out? what if it timed out due to reasons totally unrelated to versioning?
is the idea that it's ok to cancel the redirect because if the WF really does need to change deployments or change behavior, and the timeout was due to a transient issue, the next WFT will re-initiate the cancelled redirect, it will succeed, and the redirect will complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved with some clarifying questions
What changed?
Update the versioning info of workflows based on information received in WF and Activity task events.
Why?
How did you test it?
Potential risks
Documentation
Is hotfix candidate?