Deployment Workflow Definition + Pollers starting workflow #6807

Shivs11 · 2024-11-13T17:38:23Z

What changed?

Added logic for deployment workflow definition to signalWithStart and update local state
Added logic for pollers in matching starting these workflows
Added de-dup mechanism since we don't want repeated workflow executions for already seen <TaskQueue, DeploymentGroup, BuildID pollers.

Why?

versioning-3

How did you test it?

Added unit tests for each addition of code I have made.
Functional tests to follow.

Potential risks

None, going to a feature.

Documentation

Is hotfix candidate?

NA

… shivam/deployment-entity-workflow-fix-base-branch-latest

…in unit tests

Shivs11 · 2024-11-13T17:38:57Z

service/matching/matching_engine.go

@@ -163,6 +182,8 @@ type (
 		namespaceUpdateLockMapLock sync.Mutex
 		// Stores results of reachability queries to visibility
 		reachabilityCache reachabilityCache
+		// De-duping poll requests when starting/signaling deployment workflows


This is added to adhere to the constraint:

if same task queue and build ID appear in a poll request but with a different deployment name, the poll will be rejected

Sorry, this requirement changed with the recent semantic changes. We allow the poller to hang on. No poller rejection happens anymore.

Ah sorry, I had meant to write an additional statement with this requirement by saying "We will not reject the poll but won't make a duplicate workflow execution request".

Shivs11 · 2024-11-13T17:45:48Z

service/worker/deployment/deployment_activities.go

)
+

Just the skeleton as this shall aid quicker iteration of the next PR :)

ShahabT · 2024-11-13T19:51:33Z

service/matching/matching_engine.go

@@ -163,6 +182,8 @@ type (
 		namespaceUpdateLockMapLock sync.Mutex
 		// Stores results of reachability queries to visibility
 		reachabilityCache reachabilityCache
+		// De-duping poll requests when starting/signaling deployment workflows


Sorry, this requirement changed with the recent semantic changes. We allow the poller to hang on. No poller rejection happens anymore.

ShahabT · 2024-11-13T20:04:45Z

service/matching/matching_engine.go

@@ -545,6 +569,42 @@ pollLoop:
 			workerVersionCapabilities: request.WorkerVersionCapabilities,
 			forwardedFrom:             req.GetForwardedSource(),
 		}
+		if e.config.EnableDeployments(req.PollRequest.Namespace) && req.PollRequest.WorkerVersionCapabilities.UseVersioning {


Deployment workflow is the concern of the physical queue, hence, the best place for this is physicalTaskQueueManager. Basically, physical queue wants to register its deployment when it receives the first poll, before it starts processing that poll or any other polls arriving before the registration is done.

One advantage of that is you don't need to repeat things in different types of polls.

I had thought about this and had put it inside of matching before we register a poll for the same reason. I am not opposed to putting this inside of physicalTaskQueueManager and in hindsight, should have seen this as a better way to reduce code duplication (to a certain extent).

thanks

ShahabT · 2024-11-13T20:06:30Z

service/matching/matching_engine.go

@@ -1987,6 +2119,114 @@ func (e *matchingEngineImpl) pollTask(
 	return pm.PollTask(ctx, pollMetadata)
 }

+// isValidName checks if each character is a letter/number in the input string


it seems cleaner to move all these stuff to a different file called deployment_workflow_util.go or something similar.

ShahabT · 2024-11-13T20:09:00Z

service/matching/matching_engine.go

+// isValidName checks if each character is a letter/number in the input string
+func isValidName(input string) bool {
+	for _, char := range input {
+		if !unicode.IsLetter(char) && !unicode.IsDigit(char) {


Let's not be too restrictive here, we want to disallow only the characters that we use as delimiter. That also, means that we should not use common characters as delimiters, something like ":" or "/" seems common. Maybe we should use "|" as delimiter and just disallow that?

Why not though? I find it rather...odd..for a user to come in and have their deployment group named to a non-alphanumerical sequence (something like: "hi!@#"). I know we give liberty to our customers when naming task-queues but I thought we should keep things simpler here while naming deploymentGroups/buildId's

This might be a more product related concern though I fear

Yeah this is more of a product decision. But since we're not putting same restrictions anywhere else, I think we should avoid that here to be consistent: only the things that can break things should be prevented. In this example, it's very likely that build ID has chars such as "." or "-".

I am going to go ahead and use "?" as the delimiter for our buildID. To the best of my product knowledge, users won't name their deployment group/BuildID with a question mark (but I see a world where they name things with :, -, !) so I'm taking that gamble here and going with it as a potential delimiter.

ShahabT · 2024-11-13T20:15:46Z

service/matching/matching_engine.go

+	return nil
+}
+
+func (e *matchingEngineImpl) validateWorkflowID(workflowID string) error {


this wf ID is something internal that we create, so it does not make sense to return error for something invalid that we make. we should instead validate the user-provided pieces (deployment name and build ID) and make sure they are such that the workflow ID that they'll result into is not going to be invalid.
That means we should check the following for deployment name and build ID:

They don't have the invalid chars

The as UTF8 strings

Their length does not exceed the limit of the part in wf ID that we allocate to them. Because workflow ID will have both these values plus prefix and delimeter, the allowed length for each will be (MaxIDLengthLimit - prefix and delimeter length) / 2

Sure, I am not opposed to this. The deployment and buildID checks will be individually done on the matching side (I have added the piece of code currently doing this) and can append the new cases to the existing logic.

service/matching/matching_engine_test.go

ShahabT · 2024-11-13T20:27:40Z

service/matching/matching_engine.go

+					NamespaceId:       req.NamespaceId,
+					TaskQueueFamilies: nil,
+				}
+				_, err := e.startAndSignalDeploymentWorkflow(ctx, startDeploymentWorkflowArgs, updateDeploymentSignalInput, deploymentName, buildID, req.PollRequest.Identity)


Down the road, we'll need many different signals/updates doing different things to the wf. Each should be names specifically as what it is meant for and have its own input definition.

In this case, the signal is meant to register the TQ in the deployment. Hence it should be called something like RegisterTaskQueue or something.

Also, the deployment wf might accept the registration or not (say if this registration exceeds some limit) so it should be an undate that returns some result. We can handle the result and failure cases later, but for now it makes sense to start with an Update rather than a Signal.

Two points here:

I am okay to change the naming of the signal although I do think we should have "deployment" in the signal naming for better readability (it's nit, can be ignored I believe)

I had considered using an Update but went ahead with signals because I thought we wanted to lazily initiate our workflow executions. Moreover, using SignalWithStart also provides less latency since it's async as opposed to sync write of an Update (yes, there is guarantee of the registration happening here but I gave latency the upper hand here). Overall, I thought both options have their pros and cons each but I thought having lazy initializations of workflows was something that we desired.

For item, I think it's better to call the Update name RegisterWorker where "Worker" highlights that we are registering the set of TQ worker in a deployment. As oppose to future Update RegisterBacklog or similar when we see the first task (pinned) task for a versioned TQ.

(RegisterBacklog is a future thing that I anticipate we'd need to be able to build APIs for backlog stats for deployments, etc., no need to think about it right now)

we should have "deployment" in the signal naming for better readability
If this helps with code readability, sure. As far as the wf is concerned all the signal names are scoped within a WF definition.

dnr · 2024-11-13T21:13:53Z

service/matching/matching_engine.go

@@ -545,6 +569,42 @@ pollLoop:
 			workerVersionCapabilities: request.WorkerVersionCapabilities,
 			forwardedFrom:             req.GetForwardedSource(),
 		}
+		if e.config.EnableDeployments(req.PollRequest.Namespace) && req.PollRequest.WorkerVersionCapabilities.UseVersioning {


can we factor out more of this code?

dnr · 2024-11-13T21:14:17Z

service/matching/matching_engine.go

+				}
+
+				// adds to map to prevent multiple duplicate requests from starting workflow execution
+				e.dedupDeployments[dedupDeploymentKey] = req.PollRequest.WorkerVersionCapabilities.DeploymentName


don't we need some synchronization?

dnr · 2024-11-13T21:14:55Z

service/matching/matching_engine.go

@@ -163,6 +182,8 @@ type (
 		namespaceUpdateLockMapLock sync.Mutex
 		// Stores results of reachability queries to visibility
 		reachabilityCache reachabilityCache
+		// De-duping poll requests when starting/signaling deployment workflows
+		dedupDeployments map[dedupDeploymentsKey]string


does the map ever shrink? maybe we want a cache with a ttl?

dnr · 2024-11-13T21:16:07Z

service/matching/matching_engine.go

@@ -1987,6 +2119,114 @@ func (e *matchingEngineImpl) pollTask(
 	return pm.PollTask(ctx, pollMetadata)
 }

+// isValidName checks if each character is a letter/number in the input string
+func isValidName(input string) bool {


is valid name for what? deployments? then it should be isValidDeploymentName.. this is a package scope

I am going to alter this function definition by making it a method for taskQueuePartitionManagerImpl and change the name to isValidDeploymentName.

dnr · 2024-11-13T21:16:30Z

service/matching/matching_engine.go

+// isValidName checks if each character is a letter/number in the input string
+func isValidName(input string) bool {
+	for _, char := range input {
+		if !unicode.IsLetter(char) && !unicode.IsDigit(char) {


this seems pretty constraining.. why are we limiting things like this?

dnr · 2024-11-13T21:17:50Z

service/matching/matching_engine.go

+		return serviceerror.NewInvalidArgument("DeploymentName/BuildID cannot be empty")
+	}
+
+	// Prefix check


what's wrong if a deployment name or build id has one of these prefixes? it may be confusing but does anything break?

nothing breaks (as of now) since we separate the prefixes from the name with a delimeter (:). However, I do wonder if we want users to be allowed to keep their deployment name as a constant we use internally.

now that you say it though, I don't see much problem in it and shall remove it - thanks for making me think again on this

dnr · 2024-11-13T21:21:58Z

service/worker/deployment/deployment_workflow.go

+	info := workflow.GetInfo(ctx)
+	workflowID := info.WorkflowExecution.ID
+
+	deploymentName, buildID, err := parseDeploymentWorkflowID(workflowID)


After seeing this code I feel more strongly: We should not be parsing things out of strings. Just pass the deployment name and build id in the args and don't worry about the name here at all

This puts forward the argument of space again since passing those two as args would mean they repeatedly get passed around when doing CAN

I don't have an exact metric right now to tell you if adding two constants would make such a huge difference in size or not but I thought it was best to keep things separate and not repeat themself (hence the parsing)

@ShahabT , I recall you were opposed to the idea of having them passed as args for the same reason right? wdyt?

I don't have an objection to pass these values in the args, but then we should validate that the passed values does match the wf ID.

Also, regardless, we need to keep deployment -> wf ID mapping deterministic for the DescribeDeployment and ListDeployment APIs. So we can not get rid of the verification and wf ID construction part. (We can get rid of parsing though)

I don't have an objection to pass these values in the args, but then we should validate that the passed values does match the wf ID.

We build out the workflowID using the two supplied arguments in matching - In the event that the supplied arguments are valid (individual checks), the workflowID build would also be valid since it's a combination of the supplied args + our reserved delimiters. I am not sure why, in the workflow, should we be further validating if the passed values match the wf ID

dnr · 2024-11-13T21:23:22Z

service/worker/deployment/deployment_workflow.go

+		var signalInput *deployspb.UpdateDeploymentSignalInput
+		updateDeploymentSignalChannel.Receive(d.ctx, &signalInput)
+
+		if d.DeploymentLocalState.TaskQueueFamilies == nil {


I think you need separate checks for d.DeploymentLocalState.TaskQueueFamilies == nil and d.DeploymentLocalState.TaskQueueFamilies[signalInput.Name] == nil, right?

Won't d.DeploymentLocalState.TaskQueueFamilies[signalInput.Name] == nil will always be true whenever we have d.DeploymentLocalState.TaskQueueFamilies == nil no?

This line of code will anyways be hit only when we are going to add a key in our map, so I thought that check was repeated (unless I've missed something)

Shivs11

@dnr - I left some comments as unanswered because they will not be applicable given that I am currently in the process of moving things inside of physicalTaskQueueManager

Shivs11 · 2024-11-13T21:44:52Z

service/worker/deployment/deployment_workflow.go

+		var signalInput *deployspb.UpdateDeploymentSignalInput
+		updateDeploymentSignalChannel.Receive(d.ctx, &signalInput)
+
+		if d.DeploymentLocalState.TaskQueueFamilies == nil {


Won't d.DeploymentLocalState.TaskQueueFamilies[signalInput.Name] == nil will always be true whenever we have d.DeploymentLocalState.TaskQueueFamilies == nil no?

This line of code will anyways be hit only when we are going to add a key in our map, so I thought that check was repeated (unless I've missed something)

Shivs11 · 2024-11-13T21:47:02Z

service/worker/deployment/deployment_workflow.go

+	info := workflow.GetInfo(ctx)
+	workflowID := info.WorkflowExecution.ID
+
+	deploymentName, buildID, err := parseDeploymentWorkflowID(workflowID)


This puts forward the argument of space again since passing those two as args would mean they repeatedly get passed around when doing CAN

I don't have an exact metric right now to tell you if adding two constants would make such a huge difference in size or not but I thought it was best to keep things separate and not repeat themself (hence the parsing)

@ShahabT , I recall you were opposed to the idea of having them passed as args for the same reason right? wdyt?

Shivs11 · 2024-11-13T21:53:56Z

service/matching/matching_engine.go

+		return serviceerror.NewInvalidArgument("DeploymentName/BuildID cannot be empty")
+	}
+
+	// Prefix check


nothing breaks (as of now) since we separate the prefixes from the name with a delimeter (:). However, I do wonder if we want users to be allowed to keep their deployment name as a constant we use internally.

now that you say it though, I don't see much problem in it and shall remove it - thanks for making me think again on this

Shivs11 · 2024-11-13T21:55:06Z

service/matching/matching_engine.go

@@ -1987,6 +2119,114 @@ func (e *matchingEngineImpl) pollTask(
 	return pm.PollTask(ctx, pollMetadata)
 }

+// isValidName checks if each character is a letter/number in the input string
+func isValidName(input string) bool {


I am going to alter this function definition by making it a method for taskQueuePartitionManagerImpl and change the name to isValidDeploymentName.

…go-routine for cleaner workflow look

Shivs11 · 2024-11-14T17:20:52Z

service/matching/physical_task_queue_manager.go

@@ -99,6 +109,9 @@ type (
 		taskValidator              taskValidator
 		tasksAddedInIntervals      *taskTracker
 		tasksDispatchedInIntervals *taskTracker
+		// isDeploymentWorkflowStarted keeps track if we have started a deployment workflow for this
+		// physicalTaskQueue
+		isDeploymentWorkflowStarted atomic.Bool


This atomic is present to handle the following condition:
"If a poller arrives with an previously-seen task-queue + buildID combination but with a different DeploymentName (meaning there has already been a Deployment wf started)"

Shivs11 · 2024-11-14T17:32:13Z

service/worker/deployment/deployment_workflow.go

@@ -51,31 +46,28 @@ type (
 	}
 )

-var (


Removing this for now since I haven't drafted activities yet. Makes sense to include this as and when I do

Shivs11 added 16 commits November 10, 2024 21:13

work on creating a deployment entity workflow

ad24286

go mod tidy

5b01e08

work on pollers starting deployment workflows (pending inspection)

412acf3

Added unit test for deployment workflow

be2f341

changed workflow args to protos + addressed comments

1806f06

Addressed comments + made better protos

ba350e2

Merge branch 'shivam/deployment-entity-workflow-fix-base-branch' into…

acf2c76

… shivam/deployment-entity-workflow-fix-base-branch-latest

trying to get things compiling

5f512bd

cleaner + better embedded structures

cfc4761

Merge branch 'shivam/deployment-entity-workflow-fix-base-branch' into…

c6f56c2

… shivam/deployment-entity-workflow-fix-base-branch-latest

updated tests to reflect latest proto changes;

47a4c1e

unit tests in matching when starting deployment workflows

457e262

Remove activity implementation to make PR review simpler

34ff8aa

Merge branch 'versioning-3' into shivam/deployment-workflow-definition

d639992

restored go.mod to versioning-3 + used sdkClient to alter workflowID …

3378164

…in unit tests

updated protos to fix build issues

0193fc6

Shivs11 commented Nov 13, 2024

View reviewed changes

Better comments and lint fixes

d563a01

Shivs11 changed the title ~~Deployment Workflow Definition + Pollers starting them~~ Deployment Workflow Definition + Pollers starting workflow Nov 13, 2024

ShahabT reviewed Nov 13, 2024

View reviewed changes

dnr reviewed Nov 13, 2024

View reviewed changes

Shivs11 commented Nov 13, 2024

View reviewed changes

Shivs11 added 3 commits November 14, 2024 12:13

experimenting with multiOperation and placing signals in a different …

1503611

…go-routine for cleaner workflow look

better comments

88fde9d

removed non-required code

8aa5db7

Shivs11 commented Nov 14, 2024

View reviewed changes

Shivs11 added 3 commits November 14, 2024 12:23

passing buildID and deployment in wf args

7b8ca69

goimports

aee533d

changed signals to have - instead of _

37dfda8

Shivs11 commented Nov 14, 2024

View reviewed changes

Shivs11 added 2 commits November 14, 2024 14:21

lint complaints

7d8a7fa

lint fixes

54fcfb2

@@ @@ -51,31 +46,28 @@ type ( @@
               	}
               )
-              var (

Deployment Workflow Definition + Pollers starting workflow #6807

Are you sure you want to change the base?

Deployment Workflow Definition + Pollers starting workflow #6807

Conversation

Shivs11 commented Nov 13, 2024

What changed?

Why?

How did you test it?

Potential risks

Documentation

Is hotfix candidate?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shivs11 Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shivs11 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Shivs11 Nov 13, 2024 •

edited

Loading