Add node affinity feature enhancements REP #22

larrylian · 2023-02-06T12:46:13Z

The value of introducing the Labels mechanism for scheduling was discussed in this REP（[Add labels mechanism] #13 ） before. NodeAffinity is now discussed separately from this REP.

In fact, everyone should know about NodeAffinity before. I think we can reach an agreement on the following points in this REP.

Agreed on API format
The main implementation plan is agreed
Ask specific single-point questions, and then discuss and reach a consensus
Some specific details can be discussed during the coding process.

ericl · 2023-02-06T19:07:22Z

This REP seems like it would be blocked on the autoscaler / GCS refactoring proposal, right @scv119 ? Do we need to have that out first before reviewing this one?

ericl · 2023-02-06T19:15:02Z

reps/2023-02-03-node-affinity-feature-enhancements.md

+3. Scheduling optimization through Labels  
+Now any node raylet has node static labels information for all nodes.  
+when NodeAffinity schedules, if it traverses the Labels of each node, the algorithm complexity is very large, and the performance will be poor.   
+** Therefore, it is necessary to generate a full-cluster node labels index table to improve scheduling performance. </b>


What's the index strategy for the more complex expressions such as negation and containment? I wonder if we need to cache the list of valid nodes per task/expression to avoid O(n^2) scheduling slowdowns with many nodes and tasks.

This tag indexing strategy is actually very simple, I added it in the REP. The tag index only needs to get the nodes with these key&values.

larrylian · 2023-02-07T02:25:06Z

This REP seems like it would be blocked on the autoscaler / GCS refactoring proposal, right @scv119 ? Do we need to have that out first before reviewing this one?
@ericl
Thanks for checking it so quickly, I just finished it now.
This REP GCS has not changed much, so I didn't describe it in a separate chapter. Then I briefly mentioned AutoScaler, and if you think you have other questions, I can add it.

stephanie-wang

This looks great, thanks for putting it together!

The main comment we should resolve before approval is about the autoscaling implementation. Ideally we would not block this REP on the autoscaler refactor, but it depends on the complexity of this design. Could you address the comment I left about this, and then we can better determine order of operations?

The other question I had is about semantics for "soft". Can you confirm that soft only considers cluster feasibility of a request's label constraints? I.e., it would still schedule a task to a feasible node even if the node has high load?

reps/2023-02-03-node-affinity-feature-enhancements.md

stephanie-wang · 2023-02-09T15:42:11Z

reps/2023-02-03-node-affinity-feature-enhancements.md

+3. If custom_resource happens to be the same as the spliced string of labels. Then it will affect the correctness of scheduling.
+
+### AutoScaler adaptation
+I understand that the adaptation of this part of AutoScaler should not be difficult. Now NodeAffinity is just one more scheduling strategy. Just follow the existing implementation of AutoScaler and adapt to the current strategy.


This proposal is more advanced than the autoscaling for NodeAffinity.

NodeAffinity is only used for nodes that are already in the cluster, so I believe the only change there was to make sure that the autoscaler didn't mistakenly start additional nodes for a queued task that wouldn't actually be able to be scheduled there.

Since this proposal is for more flexible labels, we will need the autoscaler to be aware of which nodes it should add or remove from the cluster based on the label constraints of queued tasks. So this actually is something the proposal should cover. Specifically:

what API changes do we need at the cluster config level and for the interface between the autoscaler and the GCS?

what will the autoscaler policy look like?

Handling scalability issues? (If an application uses too many different label policies, we'll have to report these in the task load)

Yeah I don't think this is quite so simple. The autoscaler runs emulated bin-packing to determine what types of nodes it needs to add to the cluster. For example, suppose a user requests {"dc": "zone-1"}. Then, the autoscaler will need to run its bin packing routine with this request and node labels in mind, in order to determine the nodes to launch: https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/resource_demand_scheduler.py

As you can see, the logic is algorithmically complex.

My assessment is that we probably need to refactor the autoscaler to unify this bin packing routine with the C++ implementation, to avoid having to write label affinity evaluation in both Python and C++.

@ericl @stephanie-wang
After thinking about it carefully, this piece is indeed very complicated. I learned that scv119 (shen chen) is just doing AutoScaler's reconstruction REP. Because NodeAffinity/ActorAffinity will take a while to develop, I think we can focus on the adaptation of the new version of AutoScaler. I will also actively discuss the restart design of AutoScaler with scv119 (shen chen).

I think the adaptation of AutoScaler is divided into two parts.

Interaction API between AutoScaler and gcs.
I think I will adapt it according to the original method at that time, Mainly adding new fields.

AutoScaler policy with node affinity (which decide what node needs to add to the cluster)
This piece is indeed more complicated, and can be realized as follows:

The types of nodes that AutoScaler can add to the cluster are generally predicted or configured. For example:
Node Type 1: 4C8G labels={"instance":"4C8G"} ,
Node Type 2: 8C16G labels={"instance":"8C16G"},
Node Type 3: 16C32G labels={"instance":"16C32G "}

The label of NodeAffinity is the case in the standby node.
For example, the following scenario:
Actor.options(num_cpus=1, scheuling_strategy=NodeAffinity(label_in("instance": "4C8G")).remote()
If the Actor is pending, the autoscaler traverses the prepared nodes to see which node meets the requirements of [resource: 1C, node label: {"instance":"4C8G"}]. If so, add it to the cluster.

The label of NodeAffinity is a unique special label
In this scenario, it is considered that the Actor/Task wants to be compatible with a special node, and there is no need to expand the node for it.
eg:
Actor.options(num_cpus=1, scheuling_strategy=NodeAffinity(label_in("node_ip": "xxx.xx.xx.xx")).remote()

anti-affinity to a node with special label.
It can be pre-judged whether the prefabricated nodes can meet this anti-affinity requirement, and if so, they can be added to the cluster.

Soft strategy.
In this scenario, it can be pre-judged whether the labels and resources of the prefabricated nodes can be satisfied, and if so, use such nodes. If none of the labels can satisfy, just add a node with enough resources.

reps/2023-02-03-node-affinity-feature-enhancements.md

larrylian · 2023-02-15T09:42:27Z

@scv119 @ericl @stephanie-wang

We understand that you also have demands for Runtime Env acceleration, and I have also added a proposal to use Node Labels to achieve Runtime Env acceleration, you can take a look.
Do you have any comments on the two REPs Node affinity and Actor Affinity? Can you help advance the progress of these two REPs? This way I can start developing as soon as possible.

cc @wumuzi520 @SongGuyang

Use Node Labels to achieve Runtime Env acceleration:

reps/2023-02-03-node-affinity-feature-enhancements.md

zhe-thoughts · 2023-03-08T21:53:34Z

reps/2023-02-03-node-affinity-feature-enhancements.md

+
+@ericl @stephanie-wang @wumuzi520 SenlinZhu @Chong Li  @scv119 (Chen Shen) @Sang Cho @jjyao (Jiajun Yao) @Yi Cheng
+### Shepherd of the Proposal (should be a senior committer)
+


This is a required piece of information. I suggest @scv119

Agree . I will add it.

reps/2023-02-03-node-affinity-feature-enhancements.md

Signed-off-by: 稚鱼 <[email protected]>

scv119 · 2023-03-09T20:11:00Z

@larrylian let me know once you are done with autoscaler part.

rkooo567 · 2023-03-24T13:47:22Z

The implementation & API makes sense to me. I have a couple questions regarding usability for common cases.

When deploying Ray, many ppl use the autoscaler/kuberay, which only allows users to have homogeneous commands for all worker nodes (worker_startup_commands). I.e., for normal users, it will be impossible/difficult to set different node labels depending for different worker nodes. We should allow to pass the label via env var + allow users to specify labels like we specify resources (from the autoscaler & kuberay config). For example, in this yaml file, we should have a label field next to resource fields https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/aws.html#create-a-minimal-cluster-config-yaml-named-cloudwatch-basic-yaml-with-the-following-contents.
NodeAffinitySchedulingStrategy vs node_affinity seems confusing, and I think we should combine these two.
We may need to introduce the default labels. E.g., instance type, zone, whether or not it has GPU sort of information can be added by default. It probably doesn't need to be addressed within this REP

larrylian · 2023-03-27T13:45:11Z

@rkooo567 Thanks for the very helpful advice.

It is no problem to set the static label of the node through the environment variable. I'll add it to the docs later.
Preset the default label > I also have this plan. At that time, I will preset some common default tags such as "IP", "host name", "whether there is a GPU"

NodeAffinitySchedulingStrategy vs node_affinity seems confusing, and I think we should combine these two.

I don't really understand what you mean, can you explain?

rkooo567 · 2023-03-28T00:07:26Z

oh I think it is the same comment as #22 (comment) actually. So I believe it is addressed

reps/2023-02-03-node-affinity-feature-enhancements.md

ericl reviewed Feb 6, 2023

View reviewed changes

larrylian force-pushed the node_affinity branch from 26e4668 to 5dc2667 Compare February 7, 2023 03:41

larrylian requested review from ericl, rkooo567, scv119, wumuzi520, stephanie-wang, jjyao, Chong-Li and fishbone February 7, 2023 06:15

larrylian force-pushed the node_affinity branch from 5dc2667 to 98bc2e9 Compare February 7, 2023 06:24

larrylian requested a review from jovany-wang February 7, 2023 07:12

larrylian force-pushed the node_affinity branch from 98bc2e9 to 6144b94 Compare February 8, 2023 09:35

stephanie-wang self-assigned this Feb 9, 2023

stephanie-wang reviewed Feb 9, 2023

View reviewed changes

stephanie-wang assigned scv119 and ericl Feb 9, 2023

larrylian force-pushed the node_affinity branch 3 times, most recently from e1cf299 to 47ddec6 Compare February 15, 2023 09:24

jjyao reviewed Feb 27, 2023

View reviewed changes

reps/2023-02-03-node-affinity-feature-enhancements.md Show resolved Hide resolved

zhe-thoughts reviewed Mar 8, 2023

View reviewed changes

reps/2023-02-03-node-affinity-feature-enhancements.md Outdated Show resolved Hide resolved

scv119 reviewed Mar 9, 2023

View reviewed changes

reps/2023-02-03-node-affinity-feature-enhancements.md Outdated Show resolved Hide resolved

Add node affinity feature enhancement rep

f69edcc

Signed-off-by: 稚鱼 <[email protected]>

larrylian force-pushed the node_affinity branch from 47ddec6 to f69edcc Compare March 9, 2023 02:38

scv119 added the author-action-required label Mar 9, 2023

add autoscaler design

13dc937

scv119 removed the author-action-required label Apr 7, 2023

jjyao approved these changes Apr 7, 2023

View reviewed changes

reps/2023-02-03-node-affinity-feature-enhancements.md Outdated Show resolved Hide resolved

fix labels affinity

edb208d

scv119 approved these changes Apr 10, 2023

View reviewed changes

rkooo567 approved these changes Apr 10, 2023

View reviewed changes

larrylian requested review from stephanie-wang and zhe-thoughts April 12, 2023 06:24

zhe-thoughts approved these changes Apr 17, 2023

View reviewed changes

zhe-thoughts merged commit 1952b1d into main Apr 17, 2023

larrylian mentioned this pull request Apr 30, 2023

[Core] Enhancing node affinity scheduling feature through node labels ray-project/ray#34894

Open

28 tasks

jjyao deleted the node_affinity branch May 24, 2023 15:55

larrylian mentioned this pull request Jun 14, 2023

[Core][Labels Scheduling]Finalize the new node affinity scheduling with node labels API in the Python worker ray-project/ray#36419

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add node affinity feature enhancements REP #22

Add node affinity feature enhancements REP #22

larrylian commented Feb 6, 2023 •

edited

Loading

ericl commented Feb 6, 2023

ericl Feb 6, 2023

larrylian Feb 7, 2023

larrylian commented Feb 7, 2023 •

edited

Loading

stephanie-wang left a comment

stephanie-wang Feb 9, 2023

ericl Feb 9, 2023

larrylian Feb 13, 2023 •

edited

Loading

larrylian commented Feb 15, 2023 •

edited

Loading

zhe-thoughts Mar 8, 2023

larrylian Mar 9, 2023

scv119 commented Mar 9, 2023

rkooo567 commented Mar 24, 2023 •

edited

Loading

larrylian commented Mar 27, 2023

rkooo567 commented Mar 28, 2023


		@ericl @stephanie-wang @wumuzi520 SenlinZhu @Chong Li @scv119 (Chen Shen) @Sang Cho @jjyao (Jiajun Yao) @Yi Cheng
		### Shepherd of the Proposal (should be a senior committer)

Add node affinity feature enhancements REP #22

Add node affinity feature enhancements REP #22

Conversation

larrylian commented Feb 6, 2023 • edited Loading

ericl commented Feb 6, 2023

ericl Feb 6, 2023

Choose a reason for hiding this comment

larrylian Feb 7, 2023

Choose a reason for hiding this comment

larrylian commented Feb 7, 2023 • edited Loading

stephanie-wang left a comment

Choose a reason for hiding this comment

stephanie-wang Feb 9, 2023

Choose a reason for hiding this comment

ericl Feb 9, 2023

Choose a reason for hiding this comment

larrylian Feb 13, 2023 • edited Loading

Choose a reason for hiding this comment

larrylian commented Feb 15, 2023 • edited Loading

zhe-thoughts Mar 8, 2023

Choose a reason for hiding this comment

larrylian Mar 9, 2023

Choose a reason for hiding this comment

scv119 commented Mar 9, 2023

rkooo567 commented Mar 24, 2023 • edited Loading

larrylian commented Mar 27, 2023

rkooo567 commented Mar 28, 2023

larrylian commented Feb 6, 2023 •

edited

Loading

larrylian commented Feb 7, 2023 •

edited

Loading

larrylian Feb 13, 2023 •

edited

Loading

larrylian commented Feb 15, 2023 •

edited

Loading

rkooo567 commented Mar 24, 2023 •

edited

Loading