Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to pull-policy for pulling images according to user specified pulling interval (e.g., hourly, daily, weekly, interval=5d4h30m ) #2075

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Parthiba-Hazra
Copy link
Contributor

Summary

This is the minimal PR for issue - #1368 .We need to discuss about the correctness and quality of changes. I didn't add any test cases yet as some functionalities might change after some reviews, will add those tests gradually as this PR keeps progressing.

Output

Screenshot from 2024-02-21 00-14-15
Screenshot from 2024-02-21 00-17-07
Screenshot from 2024-02-21 00-02-43
Screenshot from 2024-02-21 00-03-36

Documentation

Right now it's not documented yet.

  • Should this change be documented?
    • Yes

Related

#1368

Resolves #1368

@Parthiba-Hazra Parthiba-Hazra requested review from a team as code owners February 20, 2024 19:04
@github-actions github-actions bot added this to the 0.34.0 milestone Feb 20, 2024
@github-actions github-actions bot added the type/enhancement Issue that requests a new feature or improvement. label Feb 20, 2024
pkg/image/pull_policy.go Fixed Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
pkg/image/fetcher.go Outdated Show resolved Hide resolved
@@ -106,6 +107,22 @@ func (f *Fetcher) Fetch(ctx context.Context, name string, options FetchOptions)
if err == nil || !errors.Is(err, ErrNotFound) {
return img, err
}
case PullWithInterval, PullDaily, PullHourly, PullWeekly:
img, err := f.fetchDaemonImage(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check the interval first before deciding to fetch from the daemon?

if pull, _ := f.CheckImagePullInterval(name); !pull {
   img, err := f.fetchDaemonImage(name)
   if err == nil {
      // removes it from the image.json file, maybe it was deleted
   }
   return img, err
} 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's related to this message
I will update it soon.

@Parthiba-Hazra Parthiba-Hazra marked this pull request as draft March 3, 2024 18:18
@github-actions github-actions bot added type/chore Issue that requests non-user facing changes. and removed type/chore Issue that requests non-user facing changes. labels Mar 3, 2024
pkg/image/fetcher.go Outdated Show resolved Hide resolved
@jjbustamante
Copy link
Member

jjbustamante commented Mar 4, 2024

@Parthiba-Hazra this is looking really good! thanks for your amazing work!!

I think your tests are failing because you need to run make format to fix some formatting issues

Screenshot 2024-03-04 at 10 37 10 AM

You can run make verify in your local environment before pushing, usually we forget about it

@Parthiba-Hazra
Copy link
Contributor Author

@Parthiba-Hazra this is looking really good! thanks for your amazing work!!

I think your tests are failing because you need to run make format to fix some formatting issues

Screenshot 2024-03-04 at 10 37 10 AM

You can run make verify in your local environment before pushing, usually we forget about it

@jjbustamante currently I'm still working on test cases .. actually getting some errors while running test cases of fetcher_test.go, it's saying it can't connect to docker daemon cause it's not running although it is running. Let me resolve this and add the all necessary test cases ... I will try to finish it withing this week.

internal/commands/config_prune_interval.go Outdated Show resolved Hide resolved
return WriteFile(updatedJSON)
}

func ReadImageJSON(l logging.Logger) (*ImageJSON, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating a handler for these methods? I think this ReadImageJSON has a lot of responsibilities:

  • It is trying to discover the user's home directory
  • it creates the whole folder structure to save the image.json on the disk
  • it reads and writes!
type ImagePullPolicyManager struct {
	Logger logging.Logger
}

func (i *ImagePullPolicyManager) Read(path string) (*ImageJSON, error) {
  	imagePath := filepath.Join(path, "image.json")
	// Check if the file exists, if not, create it with minimum JSON
	if _, err := os.Stat(imagePath); os.IsNotExist(err) {
                // We don't need to write on disk, we are reading, right? 
                // We return our default 
		return &ImagePullPolicy{
			Interval: &Interval{
				PullingInterval: "7d",
				LastPrune:       "",
			},
			Image: &ImageData{},
		}, nil
	}
        // do what you are doing today
}

func (i *ImagePullPolicyManager) Write(imagePullPolicy ImagePullPolicy, path string) error {
	// I will move all the marshaling logic into these methods, I don't want the method caller to worry about them
        updatedJSON, err := json.MarshalIndent(imagePullPolicy, "", "    ")
	if err != nil {
		return errors.Wrap(err, "failed to marshal updated records")
	}
	return WriteFile(updatedJSON, filepath.Join(path, "image.json"))
}

func WriteFile(data []byte, path string) error {
        // Try this one, it will create the file if it doesn't exits.
	file, err := os.OpenFile(path, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0644)
	if err != nil {
		return errors.New("failed to open file: " + err.Error())
	}
	defer file.Close()
	_, err = file.Write(data)
	if err != nil {
		return errors.New("failed to write data to file: " + err.Error())
	}
	return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks way better than what I was currently doing.


// ParsePullPolicy from string with support for interval formats
func ParsePullPolicy(policy string, logger logging.Logger) (PullPolicy, error) {
pullPolicyManager := NewPullPolicyManager(logger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbustamante I'm not sure about spawning the manager here. should we create the manager in the caller function of ParsePullPolicy and then pass it to the function?

Copy link
Member

@jjbustamante jjbustamante Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let add some suggestions about this.

First of all, a simplify class diagram of your current implementation looks like this:

classDiagram
    Client o-- ImageFetcher
    Fetcher  ..|>  ImageFetcher
    Fetcher o-- ImagePullChecker 
    ImagePullPolicyManager ..|> ImagePullChecker

    Client : +bool experimental
    Client : +map[string]string registryMirrors
    Client : +string version
    Client : +authn.Keychain

    Client --> ImagePullPolicyManager: Remove it!
    ImagePullChecker ..> Fetcher : Remove it!
  
    class  ImageFetcher {
          <<interface>>
    }

    class Fetcher {
    }

    class ImagePullChecker {
        <<interface>>
        CheckImagePullInterval()
	Read()
	PruneOldImages(f *Fetcher) error
	UpdateImagePullRecord()
        Write()
    }

    class ImagePullPolicyManager {
    }
Loading

The first thing I noticed that doesn't look very health is the circular dependency between ImagePullChecker and Fetcher in the method PruneOldImages(f *Fetcher) error, we need to remove that, I noticed what you are trying to do is to load a local image from the daemon to check if the image exists, maybe you can use the imgutil.Local image to do that, but we need to remove that circular dependency.

The other thing I will suggest is:

  • You are creating an instance of NewPullPolicyManager(logger) in 4 places right now, I think we can:
    • hold a reference into the Client to an image.ImagePullChecker and add an option to set it up
type Client struct {
   imagePullChecker    image.ImagePullChecker
}

func WithImagePullChecker(i image.ImagePullChecker) Option {
       return func(c *Client) {
               c.imagePullChecker = i
       }
}
  • After that, you can move the creation of the PullPolicyManager to the cmd/cmd.go just right before creating the client. Update the method to create the client to use the Option we declared before
imagePullChecker := image.NewPullPolicyManager(logger)
packClient, err := initClient(logger, cfg, imagePullChecker)

func initClient(logger logging.Logger, cfg config.Config, imagePullChecker image.ImagePullChecker) (*client.Client, error) {
 return client.NewClient(client.WithLogger(logger), client.WithExperimental(cfg.Experimental), client.WithImagePullChecker(imagePullChecker))
}
  • I will move the ParsePullPolicy method to be part of the ImagePullChecker interface
func (i *ImagePullPolicyManager) ParsePullPolicy(policy string) (PullPolicy, error) {
  // Do what we have today
}
  • Because of the previous change, all the commands are broken now, I will update the commands that need to call ParsePullPolicy and passthrough the ImagePullChecker, for example in cmd/cmd.go
 // remember we have the imagePullChecker at this point, because cmd is creating it.
 rootCmd.AddCommand(commands.Build(logger, cfg, packClient, imagePullChecker))
  • I will repeat the same thing for all the other commands that are broken

pkg/image/fetcher.go Outdated Show resolved Hide resolved
@@ -35,6 +36,22 @@ type LayoutOption struct {
Sparse bool
}

type ImagePullChecker interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not totally convinced about this name, I do not have any suggestion, but not sure. I will try to think about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbustamante what about ImagePullPolicyService or ImagePullPolicyHandler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like ** ImagePullPolicyHandler**. Also, can you move the interface definition to the client file? I think it must be along with the others

Copy link
Contributor Author

@Parthiba-Hazra Parthiba-Hazra Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbustamante please one more question- should we need to create a mock of NewPullPolicyManager method for test cases (I think we should) -

func NewPullPolicyManager(logger logging.Logger) ImagePullChecker {
	return &ImagePullPolicyManager{Logger: logger}
}

or can use it directly during tests.
edit - asking for the commands test cases which are currently broken due to requested changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like ** ImagePullPolicyHandler**. Also, can you move the interface definition to the client file? I think it must be along with the others

if we move the ImagePullChecker interface to the client file then there will be a import cycle between client and image package cause Fetcher struct has a field imagePullChecker type ImagePullChecker interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! .. ok, let's keep it there for now, we will come back to that later

@jjbustamante jjbustamante modified the milestones: 0.34.0, 0.35.0 Mar 13, 2024
@jjbustamante jjbustamante modified the milestones: 0.34.0, 0.35.0 Apr 3, 2024
Copy link
Contributor

@edmorley edmorley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thank you so much for working on this :-D (I'm the person that asked for this in #1368)

I notice that the implementation here currently supports both:

  • named durations (eg hourly, daily, weekly)
  • and custom durations (interval=<_d_h_m>)

I wonder if we could simplify things and support only one or the other?

My instinct would be to say the named values would cover 95% of needs and so perhaps we don't need the interval=<_d_h_m> form? (Bearing in mind, the main use-case here is to prevent excessive pulls during CI runs, or to reduce hassle/download requirements when building locally and the upstream builder rapidly changes; both of which are handled by the named hourly, daily, weekly durations. Anyone who needs anything more advanced can use a pull policy of never and handle their own pulling or similar.) Removing interval=<_d_h_m> would also mean the parsing implementation can be simplified, and from an end-user perspective means the config options are more consistent (ie: all words, not some words, some key-value pairs etc.)

Also, I see that a new prune-interval config option was added to control when to remove image entries from the .pack/image.json file. If we were to set an upper bound for the pull interval duration (eg say the interval is not allowed to be longer than N duration), we could potentially simplify things further, and remove prune-interval, and instead set pruning to be that max value?

Combining all of the above, if we were to remove support for interval=<_d_h_m>, then we know the max possible pull-policy duration is weekly, and as such the auto-pruning can simply prune any entry older than a week, meaning we can then remove the prune-interval config option.

Thoughts? :-)

@Parthiba-Hazra
Copy link
Contributor Author

Hi! Thank you so much for working on this :-D (I'm the person that asked for this in #1368)

I notice that the implementation here currently supports both:

  • named durations (eg hourly, daily, weekly)
  • and custom durations (interval=<_d_h_m>)

I wonder if we could simplify things and support only one or the other?

My instinct would be to say the named values would cover 95% of needs and so perhaps we don't need the interval=<_d_h_m> form? (Bearing in mind, the main use-case here is to prevent excessive pulls during CI runs, or to reduce hassle/download requirements when building locally and the upstream builder rapidly changes; both of which are handled by the named hourly, daily, weekly durations. Anyone who needs anything more advanced can use a pull policy of never and handle their own pulling or similar.) Removing interval=<_d_h_m> would also mean the parsing implementation can be simplified, and from an end-user perspective means the config options are more consistent (ie: all words, not some words, some key-value pairs etc.)

Also, I see that a new prune-interval config option was added to control when to remove image entries from the .pack/image.json file. If we were to set an upper bound for the pull interval duration (eg say the interval is not allowed to be longer than N duration), we could potentially simplify things further, and remove prune-interval, and instead set pruning to be that max value?

Combining all of the above, if we were to remove support for interval=<_d_h_m>, then we know the max possible pull-policy duration is weekly, and as such the auto-pruning can simply prune any entry older than a week, meaning we can then remove the prune-interval config option.

Thoughts? :-)

@edmorley thank you for your review, sure we can do your cahanges easily, actually I implement this after going through all the maintainer's comments on that issue and I found out this would be the most flexible solution. @jjbustamante @jkutner any thought on this ?

@jjbustamante
Copy link
Member

@edmorley thank you for your review, sure we can do your cahanges easily, actually I implement this after going through all the maintainer's comments on that issue and I found out this would be the most flexible solution. @jjbustamante @jkutner any thought on this ?

I am happy with @edmorley suggestions if we can support 95% of the use cases and keep the code simpler, let's go for it!

@Parthiba-Hazra
Copy link
Contributor Author

@edmorley thank you for your review, sure we can do your cahanges easily, actually I implement this after going through all the maintainer's comments on that issue and I found out this would be the most flexible solution. @jjbustamante @jkutner any thought on this ?

I am happy with @edmorley suggestions if we can support 95% of the use cases and keep the code simpler, let's go for it!

Sure, I'm gonna work on this. Thanks.

Parthiba-Hazra and others added 10 commits April 23, 2024 19:09
…olicy interval=2d5h20m)

Signed-off-by: Parthiba-Hazra <[email protected]>
…reshold and fix some previous functionalities

Signed-off-by: Parthiba-Hazra <[email protected]>
…icy's functionalities

Signed-off-by: Parthiba-Hazra <[email protected]>

some minor changes

Signed-off-by: Parthiba-Hazra <[email protected]>

minor changes

Signed-off-by: Parthiba-Hazra <[email protected]>
… deciding wheather to fetch the image from daemon or not

Signed-off-by: Parthiba-Hazra <[email protected]>
- add test cases for new pull-policies(PullWithInterval, PullHourly
  PullDaily, PullWeekly)

Signed-off-by: Parthiba-Hazra <[email protected]>
Bumps the go-dependencies group with 2 updates: [github.com/gdamore/tcell/v2](https://github.com/gdamore/tcell) and [golang.org/x/crypto](https://github.com/golang/crypto).

Updates `github.com/gdamore/tcell/v2` from 2.7.0 to 2.7.1
- [Release notes](https://github.com/gdamore/tcell/releases)
- [Changelog](https://github.com/gdamore/tcell/blob/main/CHANGESv2.md)
- [Commits](gdamore/tcell@v2.7.0...v2.7.1)

Updates `golang.org/x/crypto` from 0.19.0 to 0.20.0
- [Commits](golang/crypto@v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: github.com/gdamore/tcell/v2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: go-dependencies
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: go-dependencies
...

Signed-off-by: dependabot[bot] <[email protected]>
- This commit introduces an ImagePullPolicyManager struct responsible
  for handling image JSON operations, including reading and writing the
  image.json file. The ReadImageJSON and WriteFile functions have been
  refactored to methods of this manager.

Signed-off-by: Parthiba-Hazra <[email protected]>
- remove circular dependency between `ImagePullChecker` and
  `Fetcher` in the method PruneOldImages. Moved the creation
  of the `ImagePullPolicyManager` to the cmd package that
  reduces the number of instances created.

  Additionally, renamed the `ImagePullChecker` interface to
  `ImagePullPolicyHandler`

Signed-off-by: Parthiba-Hazra <[email protected]>
@Parthiba-Hazra Parthiba-Hazra marked this pull request as draft April 23, 2024 15:21
- removing interval pull policy as we can
  cover 95% of use cases using the hourly, daily,
  weekly pull policies and also removing the pruning
  interval command as it will set to be constant 7 days.

Signed-off-by: Parthiba-Hazra <[email protected]>
}
imageJSON.Image.ImageIDtoTIME[imageID] = timestamp

err = i.Write(imageJSON, path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjbustamante I think it's not a good idea to call Read and Write under every function. Should we handle it in the Fetch method only? What do you think?

@natalieparellano
Copy link
Member

Hi there - just checking in on this one. Do we have a sense of the effort left to complete it?

@Parthiba-Hazra
Copy link
Contributor Author

Hi there - just checking in on this one. Do we have a sense of the effort left to complete it?

@natalieparellano actually I implemented a complex version of this feature. and as @edmorley suggestion we should keep it simpler as that will enough for the actual use cases, and as far I can remember I request some suggestion from @jjbustamante and I think he must be busy on something else. anyway what should we do here, I can still work on it to get it done.

@jjbustamante
Copy link
Member

I moved this one to milestone 0.35 because 0.34 was already big enough, but I think we can start merging the latest changes into it

@natalieparellano natalieparellano modified the milestones: 0.35.0, 0.36.0 Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/chore Issue that requests non-user facing changes. type/enhancement Issue that requests a new feature or improvement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for a pull policy that only pulls new images periodically
4 participants