Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][dashboard] Change file_tail_iterator to async. #47721

Merged
merged 6 commits into from
Sep 18, 2024

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Sep 17, 2024

On ray job submit, the CLI tails logs from the job agent. The agent needs to read log tails from an iterator and yields to a WebSocket. However the file reading iterator is SYNC so it blocks agent event loop, causing the agent to block, making downstream consumers like KubeRay to break. Changes the log reading function file_tail_iterator to an AsyncIterator and a 1s time.sleep to asyncio.sleep to unblock.

The issue was introduced from #44658.

Fixes #47637.
Fixes ray-project/kuberay#2355

@rynewang rynewang changed the title [core][dashboard] Change file_tail_iterator to async [core][dashboard] Change file_tail_iterator to async. Sep 17, 2024
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, but I’m curious why Ray 2.9 works. I thought we’ve been using Iterator instead of AsyncIterator for years.

Signed-off-by: Ruiyang Wang <[email protected]>
@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Sep 17, 2024
@rynewang
Copy link
Contributor Author

didn't take time to blame out the problematic pr...

@rynewang rynewang enabled auto-merge (squash) September 17, 2024 23:29
Signed-off-by: Ruiyang Wang <[email protected]>
@rynewang
Copy link
Contributor Author

@kevin85421 so it's #44658 from 5 mo ago I think

@rynewang rynewang merged commit bc2b26e into ray-project:master Sep 18, 2024
4 of 5 checks passed
@rynewang rynewang deleted the async-file-iterator branch September 18, 2024 18:55
@shaowei-su
Copy link

shaowei-su commented Sep 19, 2024

Hi @rynewang @kevin85421 will this fix be included in the ray nightly release? thanks!
https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl

ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Oct 15, 2024
)

On `ray job submit`, the CLI tails logs from the job agent. The agent
needs to read log tails from an iterator and yields to a WebSocket.
However the file reading iterator is SYNC so it blocks agent event loop,
causing the agent to block, making downstream consumers like KubeRay to
break. Changes the log reading function `file_tail_iterator` to an
AsyncIterator and a 1s `time.sleep` to `asyncio.sleep` to unblock.

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
3 participants