Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New vsphere provider supporting Supervisor (k8s) cluster. #49881

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

roshankathawate
Copy link

@roshankathawate roshankathawate commented Jan 16, 2025

Why are these changes needed?

Before these changes vSphere provider was using vSphere SDK to create a Ray cluster on vSphere. However, with availability of K8s control plane within the vSphere hypervisor (Supervisor) it is possible to deploy services using k8s operator. These changes are done to make Ray as a Supervior service on vSphere. With these changes Ray cluster is exposed as a k8s custom resource (CR). Instead of using old vSphere SDK to create and manage the cluster, now, the provider create, update, and delete Ray cluster CRs through exposed k8s API.

  1. Old provider (vSphere provider) replaced by the new vmray provider.( Updated node_provider.py)
  2. Added cluster_operator_client.py: Wrapper for calling K8s APIs to manage Ray cluster. (Autoscaler through the node provider calls functions in this file to manage a Ray cluster)
  3. Ray nodes are created through VM CR and not using pyvmomi, vsphere SDK (deleted files pyvmomi_sdk_provider.py, vsphere_sdk_provider.py, and associated test files)
  4. Ray nodes are created through VM CR instead of using Frozen VM template ( changed configs (config.py) to support new configurations, deleted redundant gpu_utils.py and utils.py)
  5. Updated default.yaml and other yaml files to support:
    • Supervisor (K8s) Namespace where the Ray cluster should be deployed
    • The Supervisor control plane VM IP (k8s API server)
    • VM image to be used to create Ray nodes
    • VM Storage class to be used for Ray nodes creation.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@roshankathawate roshankathawate marked this pull request as ready for review January 18, 2025 04:08
@roshankathawate roshankathawate requested review from hongchaodeng and a team as code owners January 18, 2025 04:08
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @roshankathawate,

  • Would you mind adding a PR description? I have no context about vSphere, so a PR description would help me better understand this PR.

  • Does it make sense to split this into smaller PRs? I haven't reviewed it yet, but I'm wondering if we can make this large PR easier to review.

  • Do you have any colleagues who could review this PR before I take a look?

Thanks

@kevin85421 kevin85421 self-assigned this Jan 31, 2025
@roshankathawate
Copy link
Author

Hi @kevin85421,

Thanks for reviewing the PR. I have tried to provide details in the PR description so you can understand the changes and sorry for not adding those before. I could make smaller PRs but if you see I have only added one new file and most of the other files are either redundant (which are deleted )or are config files. But if you still find it difficult to review let me know and I'll try to make small PRs. Once again thanks for reviewing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants