-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core][REP] GPU Memory awareness scheduling #47
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will continue
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jonathan Nitisastro <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
```python | ||
# Request a fractional GPU with specified gpu_memory in bytes. | ||
# Mutually exclusive with num_gpus. | ||
@ray.remote(gpu_memory=1024 * 1024 * 1024) # 1 mb request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support string-based syntactic sugar? Feels more pythonic that way (i.e., gpu_memory="3gb")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now we just follow how memory
is defined. I think the pythonic support can be done separately which covers both gpu_memory
and memory
changes
```python | ||
pg = placement_group([{"gpu_memory": 1024 * 1024, "CPU": 1}, {"GPU": 1}]) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the observability section here as this complicates the observability semantics.
- how is it displayed in ray status?
- for ray status, it should potentially display sth like gpu_memory: 4 gpus (A10) * 3gb?
- In ray status, if a task is scheduled with gpu_memory, both gpu & gpu memory values are subtracted?
- How is it displayed in resource_requirement in ray list tasks? Is it translated into num_gpus? Or it only includes gpu_memory? Or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ray list nodes
, it will be GPU (resources left) * gpu_memory_per_gpu
which is the constant stored in node label. ray status
, ray list task
and ray.available_resources
currently didn't show GPU memory but if we added one, it will be the same as ray list nodes
.
and yes, basically both gpu and gpu_memory values are subtracted to show the remaining
|
||
# Requesting 30GB of GPU memory from a A10 GPU with 24GB of memory. | ||
# Task won't be able to be scheduled. | ||
@ray.remote(gpu_memory=30 * 1024 * 1024 * 1024 * 1024, accelerator_type="NVIDIA_TESLA_A10G") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have 40GB gpu
and schedule 1 task with 20GB
and schedule another with with num_gpus=1, would it fail to schedule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the second one will fail since the GPU remaining after scheduler 20GB task will be 0.5
|
||
```python | ||
# Request a fractional GPU both num_gpus and gpu_memory is not allowed | ||
@ray.remote(gpu_memory=1024 * 1024 * 1024, num_gpus=0.5) # raise ValueError exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to express 2 GPUs using gpu_memory? Or is it not allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you specify this in REP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not allowed, since only either one of num_gpus
or gpu_memory
(1 gpu per request) can be specified in request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could they both be allowed? If both num_gpus
and gpu_memory
are specified, then it would require that much memory on that many GPUs. num_gpus
would default to 1, so not specifying it would get the behavior described above. It could be an error condition to specify a fractional value for num_gpus
if also specifying gpu_memory
. Thoughts?
The GPU memory scheduling prototype:
ray-project/ray#41147