Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of import pod restart policy #3619

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

paullanum
Copy link

What this PR does / why we need it:
We had some DataVolumes failing because the storage request size was accidentally too small for the size of our ISO sources. Our DVs retried continuously until we noticed the issue, stuck in a download -> fail loop that caused a spike in our cloud costs due to the increased network usage. By being able to specify the RestartPolicy of the importer pods to Never, we could prevent this scenario in the future."

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Allow configuring Restart Policy for importer pods

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Jan 28, 2025
@kubevirt-bot
Copy link
Contributor

Hi @paullanum. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign aglitke for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and sharing a clear motivation!

My worry is introducing another API knob, one that we would have to work really hard to advertise and contain the complexity of
(from my experience, folks would just overlook it and hit those issues)

What if we could instead introduce a framework for detecting a terminal state like the one described here, and respond by deleting the pod via the controller? WDYT?

@kubevirt-bot
Copy link
Contributor

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • 49b6662 Allow configuration of import pod restart policy
  • 4fc4842 Simplify restartpolicy configuration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@paullanum
Copy link
Author

I definitely understand the concern about adding another knob for users, though in this case I don't believe it would need to be heavily advertised since this is optional configuration and the currently hardcoded value (v1.RestartPolicyOnFailure) is still the default. The configuration is only for those who want or need it and existing users who overlook the change should not be affected.

Regarding complexity, I have to respectfully disagree that adding logic to detect failure states would be less complex than leveraging the out-of-the-box RestartPolicy behavior in Kubernetes.

@alromeros
Copy link
Collaborator

Might be wrong but I remember testing the Never restart policy with importer pods and it was pretty flaky, I think especially for flows needing scratch space. Transient HTTP errors are also common... generally using never is discouraged as CDI has been tested with OnFailure and we make assumptions based on this logic. I see why this feature could be useful, but I understand @akalenyu concerns. I'm not fully against it but it would be great if we could consider alternatives.

@seanmorton
Copy link

@alromeros those assumptions are good to know about. How much of a lift would it be to convert the importer pod to a Job? If that direction was taken we could use .spec.backoffLimit to limit the number of retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: no Indicates the PR's author has not DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants