-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create GitHub Repository for Kubeflow Trainer #2402
Comments
I would also support Option 1, since it can preserve the history of the project and make the However, |
Good point! I added this in the Option 1 steps. |
I like option 1. |
For Option 1, can we migrate the release to the |
I'd also favor option 1 as it brings the best outcomes. There might be an additional impact / cons for projects that maintain a downstream fork of the Just for completeness of the solution space, there could be option 4 where the repository would be renamed to |
For option 1, it seems it might be possible to copy/transfer the v1 releases over to the new |
Great find @astefanutti! We can use this tool to migrate GitHub releases as well. |
option 1 |
What's good about renaming a repo is that you can still reference it with the old name (in pulls and links). But I want to say tags must be immutable, and similarly releases, e.g. we can never delete them. To keep things clean, we can prefix the new version tags/branches with |
Why don't we start |
We discussed this at the call. We want to keep the v1 version for CRDs and
@thesuperzapper Please can you share what we lose if we delete tags from the repository ? |
This is really a case of "you can't have your cake and eat it too", that is, you are either making a new project (which needs a new repo), or you are making a new version of an existing project (which can use the same repo, possibly renamed). Put another way, how are users meant to see this change, is "Kubeflow Trainer" the V2 version of "Kubeflow Training Operator", or is it a fully new project? I think the best outcome is to have a clear path of continuity from "Kubeflow Training Operator", and make it clear that:
PS: About deleting tags, it's just simply never acceptable to change a tag once it's created. It's so foundational to the idea of a tag that people don't even discuss it. Violating this principle is similar to burning history books. |
It is fully new project, but the project have similar goals as Kubeflow Training Operator.
The challenging part is that we want to keep CRD version as |
Based on the @thesuperzapper feedback, I propose the Option 4. |
@andreyvelich just to clarify, your current proposal is:
I want to clarify a few things:
|
No the v1.10.0 version, will not be compatible with Training Operator. We only keep this version to make sure that CRD APIs and the major version of Kubeflow Trainer is consistent.
It can cause false impression that new control plane components of Kubeflow Trainer: Controller Manager, dataset and model initializers, and LLM trainer have the second version, which is incorrect.
Please check this recording for the context: https://youtu.be/zOsRKCEcMeo?t=1275 Over the past 3 years, we've been discussing a lot with @kubeflow/wg-training-leads, @franciscojavierarceo, @astefanutti, and other contributors that we need to provide a simple Python interface for ML engineers and Data Scientists to interact with Kubeflow APIs. Beyond that, this SDK is designed to integrate seamlessly with other tools like Model Registry, Feast, and Spark, delivering a unified and intuitive user experience. With this approach, users will be able to effortlessly develop AI models using Kubeflow, by simply doing something like this in their Kubeflow Notebooks/Workspace: $ pip install kubeflow
from kubeflow.spark import SparkClient
from kubeflow.trainer import TrainerClient
from kubeflow.optimzer import OptimizerClient
SparkClient().process_data()
TrainerClient().train()
OptimizerClient().tune() KFP integrates seamlessly with the Kubeflow SDK to orchestrate end-to-end ML pipelines, if users want to perform E2E MLOps/LLMOps. cc @kubeflow/wg-data-leads @ChenYi015 @bigsur0 @shravan-achar @chasecadet |
I like option 1 |
Personally, I prefer option 1. |
CRD versions are independent from their control plane components. It happens quite often a controller / operator introduces a new CRD starting at I second @thesuperzapper opinion:
So with option 4, the On the other hand, having the new SDK and operator starting at v2 may send the signal these are built on v1, improve over the lessons learnt, and do not start over from scratch. |
Yes, these are good points @astefanutti. We propose the following (I updated the Option 4):
|
@andreyvelich if you are using semantic versioning for the SDK, technically |
Good point @thesuperzapper, updated the issue. |
I think Option 4 increases the scope of this issue quite a bit to propose the FWIW I am in favor of Option 4 but just want to be explicit about the increase in scope. |
@franciscojavierarceo Yes, but we will work towards establishing this repo and publish the first release to PyPI: The concern that @johnugeorge has is how we can make it easier to control dependency between:
|
I love that you marching towards the creation of an official and maintained Kubeflow SDK with cc @ederign |
Yeah, that is good point, similar to Feast as we discussed with @franciscojavierarceo. |
💯 Once we discuss kubeflow/community#804, would be happy to incorporate it into the SDK as well. I think a unified SDK to make kubeflow easier to work with across the products would be quite wonderful, indeed. |
I worry that if we version all of our SDKs at the same time, it could be very difficult to create a sensible versioning strategy. For example, what if we want to make a breaking change in the model registry SDK but not in any other part, would we need to make a new major version for the overall SDK? |
Yeah and I think that's a reasonable trade-off for a quality user experience. |
We should discuss this in the proposal, and we should identify about what breaking changes we are talking about. For example, if breaking change is introduced into Kubernetes CRD, we should create a new version of API: The control plane and clients should be independent from each other. For example, we can say that At the end, it is cluster admins responsibility to make sure that correct version of control plane is installed in their k8s clusters. And correct version of Users just do: |
A the latest AutoML and Training WG call, we discussed how we can create a new GitHub repository and release the Kubeflow Trainer. We want to keep
v1alpha1
API version for TrainJob and TrainingRuntime, and introduce a newkubeflow
Python SDK starting from the0.1.0
version.(updated 2025-01-27). After discussions we decided to move forward with Option 4.
We explore four options for the Kubeflow Trainer project:
Option 1. Migrate the
kubeflow/training-operator
to a new repository.Steps:
kubeflow/training-operator
tokubeflow/training-operator-lts
kubeflow/training-operator
Go modules:kubeflow/training-operator
tokubeflow/trainer
.master
branch ofkubeflow/trainer
.kubeflow/trainer
on therelease-0.1
branch with thev0.1.0
tag (we should delete the existingv0.1-branch
andv0.1.0
tag).kubeflow/trainer
repository.Pros:
v0.1.0
version (we should delete .kubeflow/training-operator-lts
repository. For example,v1.10.0
Release.Cons:
kubeflow/training-operator
Go modules.Option 2. Create a new repository
kubeflow/trainer
Steps:
kubeflow/trainer
on therelease-0.1
branch with thev0.1.0
tag.Pros:
kubeflow/training-operator
Go modules.Cons:
Option 3. Start
kubeflow/trainer
with v1.10.0 release.Steps:
kubeflow/training-operator
tokubeflow/trainer
master
branch ofkubeflow/trainer
.kubeflow/trainer
using therelease-1.10
branch with thev1.10.0
tag.Pros:
kubeflow/training-operator
Go modules.Cons:
kubeflow
SDK starts with versionv1.10
.v1.10.0
Option 4 (updated 2025-01-27). Separate Kubeflow Trainer control plane from client SDK.
Steps:
kubeflow/training-operator
tokubeflow/trainer
master
branch ofkubeflow/trainer
.v2.0.0
onrelease-2.0
branch:v0.1.0
as an initial release.kubeflow
SDK from thekubeflow/trainer
GitHub temporarily.kubeflow/sdk
. This repository will host the Kubeflow Python client and potentially expand to include clients for other languages in the future (e.g., Rust, Swift, Java). The design document should outline how Kubernetes versions and other project dependencies will be managed effectively.Pros:
kubeflow/training-operator
.release-1.10
).Cons:
kubeflow/trainer
andkubeflow/sdk
.However, I think we can deal with that if we keep examples in the Kubeflow Trainer GitHub repository and use these examples to run E2E tests for both in
kubeflow/trainer
andkubeflow/sdk
Also, we can explore other options to improve our test coverage.
Personally, I prefer the Option 4 or the Option 1.
Please let us know if we have other ideas @kubeflow/wg-training-leads @kubeflow/release-team @astefanutti @kannon92 @ahg-g @kubeflow/kubeflow-steering-committee @thesuperzapper @kubeflow/wg-manifests-leads @franciscojavierarceo @Electronic-Waste @seanlaii @deepanker13 @saileshd1402 @vsoch @shravan-achar @akshaychitneni @helenxie-bit @kubeflow/release-managers @zijianjoy @james-jwu
The text was updated successfully, but these errors were encountered: