Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] max_ongoing_requests limited by max_concurrency in actor #47681

Closed
aRyBernAlTEglOTRO opened this issue Sep 16, 2024 · 1 comment · Fixed by #48274
Closed

[serve] max_ongoing_requests limited by max_concurrency in actor #47681

aRyBernAlTEglOTRO opened this issue Sep 16, 2024 · 1 comment · Fixed by #48274
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@aRyBernAlTEglOTRO
Copy link

What happened + What you expected to happen

  1. The Bug: max_ongoing_requests params in @serve.deployment isn't useful when it larger than 1000.
  2. Expected Behavior: max_ongoing_requests is useful even it larger than 1000.

Versions / Dependencies

  • Ray: 2.35.0
  • Python: 3.11.8
  • OS: Ubuntu 22.04.4 LTS

Reproduction script

Reproducible Script:

from ray import serve
from ray.serve.handle import DeploymentHandle
import asyncio

@serve.deployment(max_ongoing_requests=4096)
class Model:
    @serve.batch(max_batch_size=2048, batch_wait_timeout_s=2)
    async def __call__(self, ls: list[int]) -> list[int]:
        print(f"Length of input list: {len(ls)}")
        return ls

async def main() -> None:
    handle: DeploymentHandle = serve.run(Model.bind())
    await asyncio.gather(*[handle.remote(i) for i in range(2048)])

if __name__ == "__main__":
    asyncio.run(main())

Expect Output:

Length of input list: 2048

Actual Output:

Length of input list: 1000
Length of input list: 1000
Length of input list: 48

Issue Severity

Low: It annoys or frustrates me.

@aRyBernAlTEglOTRO aRyBernAlTEglOTRO added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 16, 2024
@aRyBernAlTEglOTRO
Copy link
Author

I think the issue is caused by the limitation of max_concurrency in Actor, which default is 1000. A quick solution is to modify add the "max_concurrency" in allowed_ray_actor_options in following and script:

allowed_ray_actor_options = {

and modify the RayActorOptionsSchema in the following script to add the support for max_concurrency.

class RayActorOptionsSchema(BaseModel):

But I think a better way is to align the max_ongoing_requests in DeploymentConfig and max_concurrency in ray actor, because they seems like share the same intention, but it will need more code changes.

@anyscalesam anyscalesam added the serve Ray Serve Related Issue label Sep 16, 2024
@akshay-anyscale akshay-anyscale added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 15, 2024
@GeneDer GeneDer removed the serve Ray Serve Related Issue label Oct 15, 2024
zcin pushed a commit that referenced this issue Oct 30, 2024
…oing_requests` (#47681) (#48274)

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
This PR modifies the actor_options used when deploying replicas.
Deployment will use the configured `max_ongoing_requests` attribute of
the deployment config as the replica's `max_concurrency` if
the concurrency is not explicitly set. This is to prevent replica's
`max_concurrency` from capping
`max_ongoing_requests`.

## Related issue number

<!-- For example: "Closes #1234" -->
Closes #47681



Signed-off-by: akyang-anyscale <[email protected]>
@jcotant1 jcotant1 added the serve Ray Serve Related Issue label Oct 31, 2024
Jay-ju pushed a commit to Jay-ju/ray that referenced this issue Nov 5, 2024
…oing_requests` (ray-project#47681) (ray-project#48274)

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
This PR modifies the actor_options used when deploying replicas.
Deployment will use the configured `max_ongoing_requests` attribute of
the deployment config as the replica's `max_concurrency` if
the concurrency is not explicitly set. This is to prevent replica's
`max_concurrency` from capping
`max_ongoing_requests`.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->
Closes ray-project#47681



Signed-off-by: akyang-anyscale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
7 participants