Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clone populator: Add clone source watches #3639

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alromeros
Copy link
Collaborator

What this PR does / why we need it:

We lack proper clone source watches in clone populator, which can cause unwanted behaviors when we depend on source updates to trigger a reconcile.

This PR adds two watches for each clone source type.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # https://issues.redhat.com/browse/CNV-56518

Release note:

Add clone source watches in clone populator

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Feb 17, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign awels for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alromeros
Copy link
Collaborator Author

/cc @akalenyu @arnongilboa
Sadly the test passes also without the fix, so it's useless as it is. Any idea to ensure this fixes the issue with gcp?

@alromeros
Copy link
Collaborator Author

/test pull-containerized-data-importer-e2e-ceph

@mhenriks
Copy link
Member

@alromeros I'm not saying we shouldn't do this but we could do a cheap fix like this when the snapshot source does not exist:

return &reconcile.Result{RequeueAfter: 2 * time.Second}, nil

If you go with the watches, make sure to change the line above

@akalenyu
Copy link
Collaborator

You probably have a second watch in that dynamic snap one

if args.Strategy == cdiv1.CloneStrategySnapshot {
if err := p.watchSnapshots(ctx, args.Log); err != nil {
return nil, err
}
}
Should be able to remove this

@akalenyu
Copy link
Collaborator

/cc @akalenyu @arnongilboa Sadly the test passes also without the fix, so it's useless as it is. Any idea to ensure this fixes the issue with gcp?

Yeah testing watches is not something we do, I think you would use https://book.kubebuilder.io/reference/envtest.html but clearly it's not integrated at all in CDI today.
You could unit test the watch func or something like that, make sure it doesn't return nil when the snapshot source isn't CDI-owned.

@alromeros
Copy link
Collaborator Author

You probably have a second watch in that dynamic snap one

if args.Strategy == cdiv1.CloneStrategySnapshot {
if err := p.watchSnapshots(ctx, args.Log); err != nil {
return nil, err
}
}

Should be able to remove this

My watch only watches source clone snapshots (when they are referenced in the volumeCloneSource spec). This is for smart clone and seem to be watching snapshots created by the controller. Don't know if we should remove that one.

@mhenriks proposal is also interesting, we can fix this with a single line by requeuing here:

return &reconcile.Result{}, nil

@akalenyu
Copy link
Collaborator

@alromeros I'm not saying we shouldn't do this but we could do a cheap fix like this when the snapshot source does not exist:

return &reconcile.Result{RequeueAfter: 2 * time.Second}, nil

If you go with the watches, make sure to change the line above

Interesting, yeah, you could do that in

if !cc.IsSnapshotReady(snapshot) {
return &reconcile.Result{}, nil
}
but it looks like the original intention was to use watches anyway with that TODO.
Should we be concerned about the hit we take with that new watch?

@alromeros alromeros force-pushed the add-clone-populator-watches branch from 249b004 to 9627d81 Compare February 19, 2025 10:17
@alromeros
Copy link
Collaborator Author

Removed both the func test and the requeue when source PVC is not ready.

@akalenyu
Copy link
Collaborator

This is for smart clone and seem to be watching snapshots created by the controller. Don't know if we should remove that one.

You're right my bad

I think this looks good. Maybe you have to digest volumesnapshots not existing in the cluster, but I am not sure. Feel free to undraft when you're ready and we'll see

@alromeros alromeros marked this pull request as ready for review February 19, 2025 10:37
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2025
@alromeros
Copy link
Collaborator Author

/retest-required

@alromeros alromeros force-pushed the add-clone-populator-watches branch 3 times, most recently from 813832d to 2f08de5 Compare February 19, 2025 17:26
@arnongilboa
Copy link
Collaborator

arnongilboa commented Feb 20, 2025

@alromeros since the only thing we are interested in is the VolumeSnapshot update to Ready a few secs after it's created, why not go with @mhenriks RequeueAfter?

@akalenyu
Copy link
Collaborator

@alromeros since the only thing we are interested in is the VolumeSnapshot update to Ready a few secs after it's created, why not go with @mhenriks RequeueAfter?

You'd be spamming requeues for no reason, especially for provisioners where snap taking is long #2531

@arnongilboa
Copy link
Collaborator

@alromeros since the only thing we are interested in is the VolumeSnapshot update to Ready a few secs after it's created, why not go with @mhenriks RequeueAfter?

You'd be spamming requeues for no reason, especially for provisioners where snap taking is long #2531

Isn't watching for any update also potentially expansive? why not add a ReadyToUse == true predicate, so we won't reconcile when not needed?

@akalenyu
Copy link
Collaborator

@alromeros since the only thing we are interested in is the VolumeSnapshot update to Ready a few secs after it's created, why not go with @mhenriks RequeueAfter?

You'd be spamming requeues for no reason, especially for provisioners where snap taking is long #2531

Isn't watching for any update also potentially expansive?

It's not free but the alternative is occupying an entire thread on potentially useless requeues

why not add a ReadyToUse == true predicate, so we won't reconcile when not needed?

I think it's a good idea, if that is the only thing we care about, we should be doing that

@alromeros
Copy link
Collaborator Author

I prefer to keep requeue logic within a watch and handle all potential issues there.

@alromeros since the only thing we are interested in is the VolumeSnapshot update to Ready a few secs after it's created, why not go with @mhenriks RequeueAfter?

You'd be spamming requeues for no reason, especially for provisioners where snap taking is long #2531

Isn't watching for any update also potentially expansive?

It's not free but the alternative is occupying an entire thread on potentially useless requeues

why not add a ReadyToUse == true predicate, so we won't reconcile when not needed?

I think it's a good idea, if that is the only thing we care about, we should be doing that

By that logic wouldn't we might as well drop most of our watches and rely on requeues when necessary? I prefer to keep requeue logic within a watch and handle all potential issues there, but I'm not an expert in the potential performance implications. Just that having watches for sources it's what we do in the DV controller and makes sense to keep it here.

@arnongilboa
Copy link
Collaborator

It's not free but the alternative is occupying an entire thread on potentially useless requeues

why not add a ReadyToUse == true predicate, so we won't reconcile when not needed?

I think it's a good idea, if that is the only thing we care about, we should be doing that

By that logic wouldn't we might as well drop most of our watches and rely on requeues when necessary? I prefer to keep requeue logic within a watch and handle all potential issues there, but I'm not an expert in the potential performance implications. Just that having watches for sources it's what we do in the DV controller and makes sense to keep it here.

I meant adding the mentioned predicate in the watch like we do with Pending DVs.

@alromeros alromeros force-pushed the add-clone-populator-watches branch 2 times, most recently from ab7504c to e80ef52 Compare February 20, 2025 18:34
@akalenyu
Copy link
Collaborator

/test pull-cdi-goveralls
/test pull-cdi-unit-test

@alromeros alromeros force-pushed the add-clone-populator-watches branch from e80ef52 to 94a07b4 Compare March 3, 2025 11:55
@alromeros alromeros force-pushed the add-clone-populator-watches branch from 94a07b4 to ce1e83c Compare March 3, 2025 12:47
@kubevirt-bot kubevirt-bot added size/M and removed size/L labels Mar 3, 2025
@alromeros
Copy link
Collaborator Author

@mhenriks @akalenyu @arnongilboa adding back the requeue for source PVCs since we need to trigger a requeue if the PVC is in use by other pods. We also do this with the non-populator flow. Hope we can still keep the watches and decrease the number of requeues if the PVC is not being used.

@alromeros
Copy link
Collaborator Author

/test pull-cdi-unit-test

@alromeros
Copy link
Collaborator Author

/retest-required

@alromeros
Copy link
Collaborator Author

/test pull-containerized-data-importer-e2e-ceph-wffc
/test pull-containerized-data-importer-e2e-upg

@alromeros
Copy link
Collaborator Author

/test pull-containerized-data-importer-fossa

return kind + "/" + namespace + "/" + name
}

if err := mgr.GetClient().List(context.TODO(), &snapshotv1.VolumeSnapshotList{}); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't know if this is a real world issue, but, if there is not snapshot support in the cluster, you'd back out and not set up a PVC watch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it but I don't know how much I should worry about this case... Don't mind handling it if necessary but code might become more complex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so complex:

if err := mgr.GetClient().List(context.TODO(), &snapshotv1.VolumeSnapshotList{}); err != nil {
if meta.IsNoMatchError(err) {
// Back out if there's no point to attempt watch
return nil
}
if !cc.IsErrCacheNotStarted(err) {
return err
}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By complex I mean that it requires duplicating the watches and mappers instead of having a list of supported sources, but yeah I'll change it if we prefer to handle this error.

predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return true },
DeleteFunc: func(e event.DeleteEvent) bool { return false },
UpdateFunc: func(e event.UpdateEvent) bool { return isSourceReady(obj) },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will requeue for any readyToUse snapshot. Instead, what you're after is a single case where readyToUse switches from false to true, so something like

UpdateFunc: func(e event.TypedUpdateEvent[*snapshotv1.VolumeSnapshot]) bool {
return !reflect.DeepEqual(e.ObjectOld.Status, e.ObjectNew.Status) ||
!reflect.DeepEqual(e.ObjectOld.Labels, e.ObjectNew.Labels)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that we still want to requeue any volume snapshot update even after it becomes ready? This would only allow for one single update requeue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller requeues the request if there's an error/some other reason. You only have to catch the transition

@akalenyu
Copy link
Collaborator

I think watchOwned could also use a predicate, I believe what happens today is that any snap goes through it's EnqueueRequestsFromMapFunc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants