Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: introduce git-change-exec tool #4202

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

christoph-zededa
Copy link
Contributor

This new tool detects if in your git tree changed files (compared to master branch and local-only changed files) and uses this information to run only the specified actions.

Here it is used to run pillar's go-tests only if something changed there, same for this tool itself and the get-deps tool.

@christoph-zededa
Copy link
Contributor Author

No, this is not possible with what gh checks out here, so closing ...

@christoph-zededa
Copy link
Contributor Author

No, this is not possible with what gh checks out here, so closing ...

hmm, seems it is possible, but I don't know what I am doing with those gh actions ...

@OhmSpectator
Copy link
Member

Do you know how you're going to use it? Will developers use it for local runs, or will you make it part of CI?

@christoph-zededa
Copy link
Contributor Author

Do you know how you're going to use it? Will developers use it for local runs, or will you make it part of CI?

It is used by make test which is also used by the CI.
So the answer is: both

@deitch
Copy link
Contributor

deitch commented Sep 4, 2024

I am a little bit confused. Is this tool's purpose to check which directories have changed files, and thus determine where we should run go test? For example, if pkg/a had changes and pkg/b did not, then run go test ./pkg/a as opposed to go test ./...?

Does it actually execute the go test, or just give the relevant paths, sort of like:

$ gce ./...
pkg/a
pkg/c
pkg/q

thus telling us which dirs changed? And we always can choose to ignore it and run all tests? And with it, I would run go test $(gce ./...)? Something like that?

A README would help, along with some sample of what it would look like when run.

@OhmSpectator
Copy link
Member

Aha! I've got the idea. I like it. Good luck with the implementation =) I find it easier to write such tools in interpreted languages, but okay, let it be =)

@christoph-zededa
Copy link
Contributor Author

christoph-zededa commented Sep 4, 2024

I am a little bit confused. Is this tool's purpose to check which directories have changed files, and thus determine where we should run go test? For example, if pkg/a had changes and pkg/b did not, then run go test ./pkg/a as opposed to go test ./...?

Does it actually execute the go test, or just give the relevant paths, sort of like:

$ gce ./...
pkg/a
pkg/c
pkg/q

thus telling us which dirs changed? And we always can choose to ignore it and run all tests? And with it, I would run go test $(gce ./...)? Something like that?

A README would help, along with some sample of what it would look like when run.

It does not do magic, it just looks in actions.go what to do exactly, f.e.: https://github.com/lf-edge/eve/pull/4202/files#diff-5bf19742ee6060707f406c1625c23226124e161d2049d12a80a8a61ef076ace8R47

Let me just be lazy with the example: https://github.com/lf-edge/eve/actions/runs/10699251983/job/29660387351?pr=4202#step:3:120 :-)

@christoph-zededa christoph-zededa force-pushed the git_change_detect branch 5 times, most recently from 6a6c964 to ea1bf18 Compare September 4, 2024 10:48
@christoph-zededa christoph-zededa marked this pull request as ready for review September 4, 2024 11:35
tools/git-change-exec/Makefile Show resolved Hide resolved
return dir
}

dir = filepath.Join(dir, "..")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember the case when Eden removed my home dir because it accessed something in .., and I'm afraid of such patterns now. Could you consider using git rev-parse --show-toplevel instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't git rev-parse --show-toplevel just do the same?

45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/agentlog", {st_mode=S_IFDIR|0755, st_size=226, ...}, 0) = 0
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/agentlog/.git", 0x7ffe67413ec0, 0) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/agentlog/.git/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/agentlog/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar", {st_mode=S_IFDIR|0755, st_size=1204, ...}, 0) = 0
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/.git", 0x7ffe67413ec0, 0) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/.git/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/pillar/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg", {st_mode=S_IFDIR|0755, st_size=804, ...}, 0) = 0
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/.git", 0x7ffe67413ec0, 0) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/.git/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/pkg/HEAD", 0x7ffe67413d70, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve", {st_mode=S_IFDIR|0755, st_size=824, ...}, 0) = 0
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/.git", {st_mode=S_IFDIR|0755, st_size=240, ...}, 0) = 0
45077 newfstatat(AT_FDCWD, "/home/christoph/projects/eve/.git/HEAD", {st_mode=S_IFREG|0644, st_size=49, ...}, AT_SYMLINK_NOFOLLOW) = 0

https://github.com/go-git/go-git ?

I am not sure what you mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid of manually adding logic to reverse directories. If there is a tool that does it for us, I would prefer it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, thanks! Will take a look at the rest of the script later...

s.pathsToFix = make([]string, 0)
}

func (s *lintSpdx) hasSpdx(path string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I would prefer just to run the already implemented script.
Otherwise, we must maintain the same logic in different places: the script in the tools and this one. Currently, the logic mentioned here differs from what is in the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is using the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think, I should directly call your script or keep this:

func (s *lintSpdx) hasSpdx(path string) bool {
        scriptPath := filepath.Join(filepath.Dir(s.ownPath), "..", "spdx-check.sh")

        cmd := exec.Command(scriptPath, path)
        err := cmd.Run()

        return err == nil
}

func (s *lintSpdx) match(path string) bool {
        if s.extsMap == nil {
                s.init()
        }

        if strings.Contains(path, "/vendor/") {
                return false
        }
        _, found := s.pathMatch(path)
        if !found {
                return false
        }
        if !s.hasSpdx(path) {
                s.pathsToFix = append(s.pathsToFix, path)
                return true
        }

        return false
}

}
}

if strings.Contains(line, "Copyright (c) ") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be less strict, as we have already figured out on one of your PRs:
"Copyright *(c) "


func (s *lintSpdx) gofix(path string) {
license := []string{
"// Copyright (c) 2024 Zededa, Inc.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year should not be hardcoded.
And we cannot put Zededa as it's the lf-edge-wide code... (we still can have such a tool in the zededa/eve-tools repo)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both points. Check for current year and no check on the holder of the copyright.

Copy link
Contributor Author

@christoph-zededa christoph-zededa Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The year is now the current year and the organization is read from the .gitconfig.

As the check is now using the spdx shell script, it does not check on the holder for the copyright.

@christoph-zededa christoph-zededa force-pushed the git_change_detect branch 3 times, most recently from 4385415 to 486b571 Compare September 9, 2024 09:45
@christoph-zededa
Copy link
Contributor Author

I don't know what yetus complains about

  • results-full.txt from https://github.com/lf-edge/eve/actions/runs/10770711970/job/29864646035#step:5:20 is empty
  • make mini-yetus shows:
    christoph@ponyhof ~/p/eve (git_change_detect)> cat /tmp/yetus.hBuc5lILQs/yetus-output/results-full.txt 
    home/christoph/projects/eve/pkg/nvidia/Dockerfile:30:25:hadolint: invalid flag: --keep-git-dir
    home/christoph/projects/eve/pkg/pillar/containerd/oci_test.go:19:2:revive: should not use dot imports https://revive.run/r#dot-imports
    home/christoph/projects/eve/pkg/pillar/dpcreconciler/linux_test.go:16:2:revive: should not use dot imports https://revive.run/r#dot-imports
    home/christoph/projects/eve/tools/check-commit-messages/check_commit_messages.py:15:0:pylint:[E0401(import-error), ] Unable to import 'git'
    home/christoph/projects/eve/tools/parse-pkgs.sh:15:8:shellcheck: note: Double quote to prevent globbing and word splitting. [SC2086]
    home/christoph/projects/eve/tools/parse-pkgs.sh:64:8:shellcheck: note: Useless echo? Instead of 'echo $(cmd)', just use 'cmd'. [SC2005]
    home/christoph/projects/eve/tools/parse-pkgs.sh:64:8:shellcheck: warning: Quote this to prevent word splitting. [SC2046]
    home/christoph/projects/eve/tools/parse-pkgs.sh:74:14:shellcheck: note: Use $(...) notation instead of legacy backticks `...`. [SC2006]
    home/christoph/projects/eve/tools/parse-pkgs.sh:74:9:shellcheck: warning: Declare and assign separately to avoid masking return values. [SC2155]
    home/christoph/projects/eve/tools/parse-pkgs.sh:84:10:shellcheck: note: Double quote to prevent globbing and word splitting. [SC2086]
    home/christoph/projects/eve/tools/parse-pkgs.sh:92:16:shellcheck: note: Double quote to prevent globbing and word splitting. [SC2086]
    home/christoph/projects/eve/tools/parse-pkgs.sh:92:8:shellcheck: note: Double quote to prevent globbing and word splitting. [SC2086]
    

@yash-zededa
Copy link
Collaborator

@christoph-zededa report.html shows

-1	patch	0m 2s		GH:4202 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

this should not be the case

@christoph-zededa
Copy link
Contributor Author

@christoph-zededa report.html shows

-1	patch	0m 2s		GH:4202 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help.

this should not be the case

thx, I rebased; I guess now it works.

// Copyright (c) 2024 Zededa, Inc.
// SPDX-License-Identifier: Apache-2.0

// Copyright (c) 2024 Zededa, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for two SPDX headers =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups, that happened when I tested it on itself ...

actionCategories = append(actionCategories, c)
}

rootCmd := cobra.Command{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that we use this complicated pattern just to parse the "-d" argument? I mean: import cobra, create rootCmd with a lambda function as a value of one of the arguments, then - run two extra methods (BoolVarP(), Exec()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is not complicated, using go's flags is complicated ;-)

I switched to cobra because I wanted to have flags and arguments.

Run: func(cmd *cobra.Command, args []string) {
gce := newGitChangeExec()

for _, cat := range args {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does cat mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to category

commonBase = append(commonBase, commit)
addBase, err := branchHead.MergeBase(commit)
if err != nil {
log.Printf("finding merge base failed: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will fail in most cases as you will try to find a merge base for all the potential target branches here. If the branch is based on master and you try to find a merge base for -stable - the result is kind of undefined and irrelevant... Maybe we can still call it here, but the error printing is unnecessary, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the error printing, now it only does debugLog

}

for _, st := range commitStats {
gce.addActionByPath(st.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to add actions for files that don't exist in the current branch's HEAD.
Also, as far as I understand it, it will run for all the commits between the HEAD and the common baseS, even the oldest, the most irrelevant -stable... It will include tens/hundreds of irrelevant commits, hence - files. Or did I get the algorithm wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will stop if it finds a hash that is in at least one of the common baseS

gce.fetchOrigin()

gce.collectActionsGitTree()
gce.collectDirtyGitTree()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to read the function code to understand the purpose of these calls or the difference between them. I guess they define the actions to be run, but the names could be more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second function, I would make it optional, depending on the flag. If I prepare a PR, I still can have some modified files that will not go into the commits: ./conf/*, for example - and I don't need to check them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make it opt-out with --ignore-dirty or perhaps:

  • git-change-exec run test to run as right now
  • git-change-exec run-branch test to only consider collectActionsGitTree()

But I think the behavior could then be a bit unexpected, f.e. if you change two files in pillar, then no action will be triggered, now you git add one of those files, then the tests for pillar will be triggered, but the tests include the second file that has not been added yet.

continue
}

fp := filepath.Join(gce.gitPath, file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we ignore the removed files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. I missed the lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too 🤣
when I tried to find out ...

@christoph-zededa christoph-zededa force-pushed the git_change_detect branch 2 times, most recently from 90d00f8 to 3e7e1d2 Compare September 9, 2024 16:54
@christoph-zededa
Copy link
Contributor Author

I added https://github.com/lf-edge/eve/pull/4202/files#diff-159efde8a5a441c430d0d0c512d4dd1553136852df9f0ac2443606d2faf1db5cR38-R46

which was the original motivation to have this as the tests take several minutes

getDepsTestAction{},
bpftraceCompilerExecTest{},
},
"lint": []action{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any invocation of the tool with "lint" as the argument. Is that missing or hiding somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I would keep it to be run by the user, but I can also add a Makefile target, perhaps lint.

Comment on lines 36 to 45
var actions = map[string][]action{
"test": []action{
pillarTestAction{},
gitChangeExecTest{},
getDepsTestAction{},
bpftraceCompilerExecTest{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like all of these are of the form
if match on directory prefix, then run this command

Do we need to put that as golang code or could we e.g., have that in some separate file? E.g., an actions_test file will be used for test, actions_lint file for lint, and actions_foo for a future foo target?
Or have the action filename be an argument to the tool instead of passing e.g., "test" as the argument?

Copy link
Contributor Author

@christoph-zededa christoph-zededa Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is in different files actions-test.go and actions-lint.go, but unfortunately it is not possible with reflection to read which structs are there that implement the action interface.

As an experiment I tried to use a golang interpreter: 66803f0#diff-5bf19742ee6060707f406c1625c23226124e161d2049d12a80a8a61ef076ace8R72
But then I thought, what if there are dependencies between actions (like f.e. for linting, one would like first to run the spdx-fixer and then go fmt), so I would have to implement a way to express dependencies.
I still strive for the tool to be simple 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see why we need the generality of golang to run the actions, since the actions do simple matches on paths and then exec code. And it would be a lot more modular and easier to understand if we don't have to put various make invocations in the golang code but keep that in separate files IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need the generality of golang to run the actions

F.e. for the lint action I need it, because it has to edit files and add the copyright statement.
Of course I could do it that git-change-exec invokes another program to do so, but then I have to add:

  • finding the path to the other program
  • serialization
  • deserialization including error(-string) parsing
  • eventually implementing a way to save state between match and do and handle a proper cleanup

(the first point makes it harder to develop this tool, because I test it on another clone of the eve repository)

more modular and easier to understand

I changed the code so that it runs make test for the tests, so the Makefile is the abstraction, but I fear that in the future the actions become more complex, so only type action interface will be an abstraction.

since the actions do simple matches on paths

That was just, because I did not add yet to ignore READMEs and markdown files; also pkg/pillar/agentlog/cmd does not need to trigger the go tests. Now it is there.
The lint action has to save the matches in an array to be able to fix them in the do method.

Generally speaking I think match can become a lot more complex in the future once f.e. we only run the tests based on code coverage.

but keep that in separate files

What would these files look like?


Suggestions from my side:

  1. put every action in it's own go-file

  2. enhance dry-run to show the match result of every file

  3. add a doDryRun() to the action interface where every action has to describe what it plans to do

This is needed for git-change-exec.

Signed-off-by: Christoph Ostarek <[email protected]>
Sometimes hard tabs come from a copy-paste example and
the example should not be modified.

Also tabs are better than spaces ;-)

Signed-off-by: Christoph Ostarek <[email protected]>
also allow to use this tool directly with a path
as a parameter

Signed-off-by: Christoph Ostarek <[email protected]>
The option is called `ignore-words` according to
https://github.com/codespell-project/codespell?tab=readme-ov-file#using-a-config-file

Also add `OptionAll` as is not a misspelled `optionally`

Signed-off-by: Christoph Ostarek <[email protected]>
add this to have a somehow similar interface as other
parts in the repo to run the tests

Signed-off-by: Christoph Ostarek <[email protected]>
This new tool detects if in your git tree changed files
(compared to master branch and local-only changed files)
and uses this information to run only the specified actions.

Here it is used to run pillar's go-tests only if something
changed there, same for this tool itself and the get-deps tool.

Signed-off-by: Christoph Ostarek <[email protected]>
in order to hope that yetus does not complain anymore

Signed-off-by: Christoph Ostarek <[email protected]>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fundamental issue is that it is impossible to tell what make does without reading every line of the tool.

Can we make this more modular by splitting it into a git-change tool and a change-exec tool, with a list of files being the output from the first and the input to the second?

And I'm curious whether the git imvocations capture the base commit which the brach is based on, or if they compare against the current state of master. The latter would be quite suboptimal since it means I type make in my workspace once and it does little work, and then without any changes in my workspace or branch I type make 5 minutes later and then it does lots of work because something changed in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants