Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] [REP] Ray OBS support #50

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

imhy
Copy link

@imhy imhy commented Apr 23, 2024

Add ability to read working_dir from OBS

@imhy imhy force-pushed the 2024-04-24-ray-obs-support branch 2 times, most recently from fcdb934 to 26f6962 Compare April 23, 2024 10:09
@ericl
Copy link
Contributor

ericl commented Apr 23, 2024

Proposal makes sense to me... it amounts to removing the URI whitelist of allowed protocols to pass through to smart_open, which I think is not necessary to restrict anyways (cc @edoakes ).


At the same time, they can specify the location of the project's working directory in a remote file system, for example s3.

This document suggests expanding the list of supported file systems and adding support for OBS (Object Storage Service).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is https://www.huaweicloud.com/intl/en-us/product/obs.html ? Would be good to add a reference to clarify

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added link to docs.

@imhy imhy force-pushed the 2024-04-24-ray-obs-support branch from 26f6962 to 2b78dcc Compare April 24, 2024 04:00
@imhy
Copy link
Author

imhy commented Apr 24, 2024

Proposal makes sense to me... it amounts to removing the URI whitelist of allowed protocols to pass through to smart_open, which I think is not necessary to restrict anyways (cc @edoakes ).

I`m ready to implement this functionality.

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't think we need a full vote on this REP because it is quite small in scope.

@imhy please go ahead and open a PR to implement this support and tag me as a reviewer. Let me know if you have any questions on how to get started.

Copy link
Contributor

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

After the implementation of this improvement, the user will be able to run submit tasks specifying the location of the working directory in OBS.


The UserCase of submiting ray job via obs is shown in Figure 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The UserCase of submiting ray job via obs is shown in Figure 1:
The use case of submiting ray job via obs is shown in Figure 1:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


- Third, after submitting the ray job, the ray cluster automatically downloads, uncompresses, and executes the codes from the OBS

![Figure 1](2024-04-24-ray-obs-support/fig1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do <img style="background-color:white" src="2024-04-24-ray-obs-support/fig1.png"> to show the image.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


The UserCase of submiting ray job via obs is shown in Figure 1:

- First, the user startups the remote ray cluster, and then upload the codes to the obs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- First, the user startups the remote ray cluster, and then upload the codes to the obs;
- First, the user starts the remote ray cluster, and then uploads the codes to the obs;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


- First, the user startups the remote ray cluster, and then upload the codes to the obs;

- Second, the user submits ray job to OBS using the submit_job APIs, as well as setting the obs path in the submit_job API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Second, the user submits ray job to OBS using the submit_job APIs, as well as setting the obs path in the submit_job API
- Second, the user submits ray job to the ray cluster using the submit_job APIs, as well as setting the obs path in the submit_job API

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


1. Compress the code into a zip or jar package and put it in OBS

2. You can use environment variables or configuration files to access the AK, SK, and Endpoint of OBS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's AK, SK?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access Key, Secret Key for access to OBS, something like in S3.
AK SK docs

)
```

4. The ray cluster automatically downloads the example_file.zip from the specified OBS bucket, decompresses it, and then runs the entry file script.py in the working path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. The ray cluster automatically downloads the example_file.zip from the specified OBS bucket, decompresses it, and then runs the entry file script.py in the working path
4. The ray cluster automatically downloads the example_file.zip from the specified OBS bucket, decompresses it, and then runs the entry file script.py in the working dir path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 72 to 78
We can extend the ray project to support obs protocol, enabling the ray cluster to parse the obs URI and download the codes from obs:

1. parse obs URI, such as "obs://example_bucket/example_file.zip";

2. config the obs AK, SK, and Endpoint via environment variables and config files;

3. the ray cluster automatically downloads and execute the codes from obs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicate of the above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

## Implementation Analysis

To extend the ray project for obs, we first need to figure out the workflow of parsing and accessing remote URIs, which is shown in Figure 2.
![Figure 2](2024-04-24-ray-obs-support/fig2.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


**Step 1**: Extend the **parse_uri** function to parse the OBS URIs, which is implemented in the file [ray/_private/runtime_env/packaging](https://github.com/ray-project/ray/blob/master/python/ray/_private/runtime_env/packaging.py).

**Step 2**: Extend the third-party library [smart_open](https://github.com/piskvorky/smart_open) to read OBS objects, which is suggested to implement 3 interfaces, i.e., parse uri, open_uri, and open, as shown in Extending smart_open.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you will create a PR in smart_open to support OBS first?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, firstly I implement required logic in smart_open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to upstream this functionality to smart_open or extend it within the Ray codebase?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firstly, I will add functionality to smart_open, then I will add logic to ray packaging.py

@imhy imhy force-pushed the 2024-04-24-ray-obs-support branch 5 times, most recently from ef6aa52 to cbcc617 Compare April 27, 2024 07:53
Signed-off-by: Sergei <[email protected]>
@imhy imhy force-pushed the 2024-04-24-ray-obs-support branch from cbcc617 to 920603e Compare April 27, 2024 08:12
@imhy
Copy link
Author

imhy commented Apr 27, 2024

LGTM. I don't think we need a full vote on this REP because it is quite small in scope.

@imhy please go ahead and open a PR to implement this support and tag me as a reviewer. Let me know if you have any questions on how to get started.

ok, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants