Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add source linting for recipe v2 #2028

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

wolfv
Copy link
Member

@wolfv wolfv commented Aug 8, 2024

cc @beckermr - I added a lint to make sure people don't use sha256 as template.

Also I think we don't support sha1 at this point. Need to double check :)

Comment on lines 189 to 192
if recipe_version == 1:
lint_sources_should_have_hash(sources_section, lints)
elif recipe_version == 2:
conda_recipe_v2_linter.lint_sources(sources_section, lints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to have two functions. Please use one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to lint v1 recipes to make sure that the SHA hash is not templated as well @isuruf?
I think that would be a good change, but some churn involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass recipe_version to lint_sources_should_have_hash and make the distinction there. Btw, how do you write a recipe like https://github.com/conda-forge/cuda-nvcc-impl-feedstock/blob/main/recipe/meta.yaml#L24-L28?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write it like this (valid yaml!):

context:
  version: "12.6.20"

source:
  - if: target_platform == "linux-64:
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-{{ version }}-archive.tar.xz
      sha256: 4336a230269db14336ca1f727a0502631390a52d37c17d3bde05ccda5d534406
      patches:
        - nvcc.profile.patch
  - if: target_platform == "linux-aarch64"
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-sbsa/cuda_nvcc-linux-sbsa-{{ version }}-archive.tar.xz
      sha256: 06112bf1cc0cfb6b881a2e0420cb4611b6d701decbb195fbfb4f0e9966a42006
      patches:
        - nvcc.profile.patch
  - if: target_platform == "win-64"
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/x64/cuda_nvcc-x64-{{ version }}-archive.zip
      sha256: 03282a36fb98c2f227877a62efd04c36b1ab01fb1b676d9edbd0b10179e0a40a
      patches:
        - nvcc.profile.patch.win

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

biggest downside is the patch duplication for the two sources although I think that's ... OK :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you agree that is an abomination, we should figure out a non-abomination. That's the whole point of v2 IMO. Make recipes not be abominations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for context, the lint suggestion arose here

regro/cf-scripts#2920 (comment)

where I commented that you'd need to update the hash in the context section too. The suggestion was that by eliminating the use of variables to specify the hash, we could simplify the code in the bot. That entire block of code is 24 lines in the bot (https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/update_recipe/version.py#L173-L197).

Then we had another discussion here

https://github.com/regro/cf-scripts/pull/2920/files/b7f61366a355ab41e0728e7cc325ed36596c3d67#discussion_r1709740255

where I pointed out that we have to evaluate the context section of the recipe under the assumptions of not just linux, but also other platforms in order to update things correctly. You suggested that this lint would help and we both agreed there would be other edge cases this would miss.

So we can have a separate discussion on this change on its own, but the context for it came from simplifying bot code by eliminating flexibility in the v2 recipe spec.

This is the motivation for my comments. This PR is an incomplete solution to solve complexity in the v2 recipe spec. We either need to reconsider the spec to eliminate complexity or we need to deal with it. Excluding the a small part of the complexity only in conda-forge is an incomplete solution. Given that many conda-forge devs helped make the spec, it's rather odd that we wouldn't use all of it.

As for the bot and the implementation, I don't see the point in avoiding the work to support the new spec in the same way. Otherwise, we do something that covers only part of the cases, people find bugs + submit bug reports, and then someone (you, me, someone else?) in the future is left either saying this won't ever happen or fixing it.

I suggested a better path forward for this work in multiple places related to eliminating technical debt around recipe parsing+editing (which is one of the original motivations for the new spec anyways) and then using that to move things ahead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think my motivation was never to make it less verbose to write complex recipes :)

My motivation was to make recipes structured, easy to parse and fast to evaluate (ie. not do the recursive craziness of conda-build).

I do think we could find solutions to this problem, but for me it's low priority as we can continue using meta.yaml, so if a user doesn't like the new format / additional verbosity there is no need to switch. From my POV it's also very few recipes that are structured this way (mostly binary repacks). And I also don't mind the more extended form, personally.

But if you have good ideas for improvements to the spec, I would love to hear them!

Thanks for adding the context, Matt. I am happy to fix bugs/issues as they come up related to the new format, of course!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're talking past one another a bit here, but this statement is a red flag:

I do think we could find solutions to this problem, but for me it's low priority as we can continue using meta.yaml, so if a user doesn't like the new format / additional verbosity there is no need to switch. From my POV it's also very few recipes that are structured this way (mostly binary repacks). And I also don't mind the more extended form, personally.

The goal here, I thought, is to move to a new spec and eliminate a previous one, all for tangible benefits (performance, interpretability, maintainability, etc.). If we are not in this process to actually deprecate and then remove the v1 spec, why on earth would we do all of this work just to support both in the end?

Thanks for adding the context, Matt. I am happy to fix bugs/issues as they come up related to the new format, of course!

Great and thank you! I have raised several issues around edge cases in the PRs we have going for v2 and my opinion is that now is a good time to address them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the outcome is that v2 replaces v1 I will be personally very happy. But I also don't want to force people on v2, especially not during the initial period. The whole transition will probably take at least a year, and the idea is that it's opt-in (if you have a recipe.yaml then new format is used, otherwise fall back to meta.yaml). I hope we're on the same page there :)

Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against this in principle but I'd like to understand how many existing recipes we might impact should they be ported to the v2 format. Also this does not solve the issue we are discussing for the bot, so should be considered separate.

@wolfv
Copy link
Member Author

wolfv commented Aug 8, 2024

@beckermr 5 pages: https://github.com/search?q=org%3Aconda-forge+%22%7B%25+set+sha256%22&type=code&p=1 :)

But I want to reiterate that we are planning a very gradual migration. I think the cost of copy-pasting the SHA hash into the right spot is negligible.

@wolfv
Copy link
Member Author

wolfv commented Aug 8, 2024

We can also ask @schuylermartin45 if we can add this (as a flag) to the recipe converter :)

@schuylermartin45
Copy link

schuylermartin45 commented Aug 8, 2024

I think I've said this before, but I call them V0 and V1 recipes. The original recipe format doesn't have a schema_version field and the new format has schema_version set to 1. I'm sure a lot of people will get confused if V2 implies a value of 1.

</rant>

Aaaaaaaanyways I'm trying to catch up here and I'll add some notes to the conversations I've seen:

  • In the vast, vast majority of recipes I've seen in the Conda Recipe Manager test data set, sha256 is used over any other hashing algorithm
  • I know we have historically had some package builders who prefer to set a JINJA hash or sha256 variable at the top of their V0 files. I don't have an exact figure, but a quick search shows about 400 Anaconda Recipes fall into this camp.
  • I also know a handful replace the key sha256: with {{ hash_type }}: and I already correct for that in the preprocessing phase of recipe conversion.

I don't think I'm 100% clear on what this flag needs to do. Personally I'd rather just have the conversion process do what is "the most right" automatically for all recipes and try not to have special exceptions/cases.

I'm curious if this project would like to start using Conda Recipe Manager (CRM) for recipe parsing/editing. That seems like something that might be useful to integrate with a linter. CRM originated and got split out of our Anaconda Linter project so others could use it.

@wolfv
Copy link
Member Author

wolfv commented Aug 8, 2024

@schuylermartin45 the transformation that I would like to see is to just insert sha256 hashes unconditionally instead of keeping them as jinja vars.

For example:

{% set sha256 = "foo" %}

source:
  url: https://foo.com
  sha256: {{ sha256 }}

Can be - pretty much always - reduced to

source:
  url: https://foo.com
  sha256: foo

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

I rebased this PR on top of #2029 which should go in first.

@beckermr
Copy link
Member

beckermr commented Aug 9, 2024

I am -1 to merging this PR in general.

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

@beckermr this is what templated SHA hashes look like in an editor right now.

Screenshot 2024-08-09 at 16 34 13

So far, no one has presented a good use case.

We can lift this limitation at a later point in time as well.

I don't really understand the push-back 🤷

@beckermr
Copy link
Member

beckermr commented Aug 9, 2024

I disagree given the discussion above.

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

The outcome of the discussion is:

We should brainstorm ideas on how to prevent the duplication there.

While we don't have a solution we might as well add a lint.

I really, really don't see a good use case for templated hashes, and no one has presented one so far. But anyways, not a hill I want to die on either :)

@schuylermartin45
Copy link

schuylermartin45 commented Aug 9, 2024

I really, really don't see a good use case for templated hashes, and no one has presented one so far. But anyways, not a hill I want to die on either :)

I was going to say this issue felt like a "proving a negative" situation. Yes it is a little silly to set such a variable in most cases, BUT I just found a real-world example where there is a reason for it:

https://github.com/AnacondaRecipes/gettext-feedstock/blob/master/recipe/meta.yaml#L28

I'm not saying there aren't other ways to handle this scenario, but we have at least one recipe in the world that sets a sha256 and uses it more than one place.

If I had to guess, there might also be some project out there hosted on some platform that puts the SHA in the URL and that changes it per version artifact.

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

@schuylermartin45 thanks for finding a use case!

With a working version-updater bot, it wouldn't really matter to have the same hash twice (as string), since the bot is going to update it in both places nicely anyways. So that would be one "workaround".

If I had to guess, there might also be some project out there hosted on some platform that puts the SHA in the URL and that changes it per version artifact.

In that case, the version update bot is lost anyways since it would have to guess a random string. It might still potentially work if it comes from e.g. github releases or so but yeah.

@beckermr
Copy link
Member

beckermr commented Aug 9, 2024

I suggest folks submit a CEP to clarify the spec if they so desire.

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

What does this have to do with the spec? :D

@beckermr
Copy link
Member

beckermr commented Aug 9, 2024

The spec allows what is in some folks opinions a bad practice that conda-forge should lint on, effectively removing part of the spec.

It seems to me that we should change the spec if we feel strongly enough to propose a lint removing part of the spec from all of conda-forge.

@wolfv
Copy link
Member Author

wolfv commented Aug 9, 2024

Hmm... that doesn't make sense to me. Like I can write really ugly Python code, but a linter is supposed to lint it. It's still valid Python.

This is a linter, not a spec enforcer...

@beckermr
Copy link
Member

beckermr commented Aug 9, 2024

OK @wolfv. I was thinking in the spirit of how we remove jinja2 if and for statements from the spec, in part to remove things that were arguably not the best practices, that this would be the same.

@wolfv
Copy link
Member Author

wolfv commented Sep 18, 2024

@isuruf @beckermr here is another way to have different hashes per platform:

source:
  url: https://foo.com/sha-0.1.0-${{ target_platform }}.tar.gz
  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}
  patches: 
   - patch1.patch

I think that's a reasonably OK way to write this.
However, currently it would still break the version bumper that I coded. Of course, it can be fixed. It's just a little annoying and I would like to defer that once we validated the rest of the pipeline to work for the 95% case of packages (e.g. simple noarch Python packages and all the rest).

@beckermr
Copy link
Member

Yeah the whole point of the new format is to not hide information in areas that are hard to parse. Putting things in if...else blocks in selectors is a bad idea.

@wolfv
Copy link
Member Author

wolfv commented Sep 18, 2024

@beckermr does that mean that you like this last suggestion or not? Feel free to tell me what you want these things to look like.

@beckermr
Copy link
Member

Right, I do not like the if...else in the templates eval expression. My suggestion is that we allow people to use templates in the hash section and work on the verbosity through some other mechanism.

@wolfv
Copy link
Member Author

wolfv commented Sep 18, 2024

@beckermr are you referring to my last example where I use if/else inside the Jinja template?

Do you like option A:

  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}

Or option B:

  - if: target_platform == "linux-64:
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-${{ version }}-archive.tar.xz
      sha256: 4336a230269db14336ca1f727a0502631390a52d37c17d3bde05ccda5d534406
      patches:
        - nvcc.profile.patch
  - if: target_platform == "linux-aarch64"
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-sbsa/cuda_nvcc-linux-sbsa-${{ version }}-archive.tar.xz
      sha256: 06112bf1cc0cfb6b881a2e0420cb4611b6d701decbb195fbfb4f0e9966a42006
      patches:
        - nvcc.profile.patch

@beckermr
Copy link
Member

I do not like option A. I am ok with option B, but I agree with @isuruf that being able to reduce the verbosity would be helpful/nice.

@h-vetinari
Copy link
Member

h-vetinari commented Sep 18, 2024

I do not like option A.

I'd prefer option A actually. Aside from the last line, the if <xyz> else almost reads like selectors:

  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}

Of course that'd mean we need to linting across multiple lines, but the ${{ ... }} brackets make this feasible IMO.

@isuruf
Copy link
Member

isuruf commented Sep 18, 2024

Option C:

source:
  url: https://foo.com/sha-0.1.0-${{ target_platform }}.tar.gz
  sha256:
    if: linux
    then: 123
    else:
      if: osx
      then: 456
      else: 789
  patches: 
   - patch1.patch

@beckermr
Copy link
Member

Option A makes automated updating of recipe versions with hashes really really hard. Selectors with single jinja2 set statements is actually easier. Option C is similarly difficult to update automatically.

The whole point of the new format was to make parsing, updating, and working with recipes easier. The fact that something is or is not valid yaml is necessary, but not sufficient for that to be true.

@beckermr
Copy link
Member

If you want to go there, a switch statement is probably the thing you are looking for:

sha256:
  switch:
    - "target == blah": foo
    - "target == blarg": blah

That is parseable yaml and enables fairly easy updating.

@wolfv
Copy link
Member Author

wolfv commented Sep 18, 2024

Yeah, it's difficult but not impossible. We just have to render the recipe once, get the current SHA, update the version, render again, download & grab the new SHA, then find + replace the old SHA.

Option B was what I initially proposed but got backlash on :)

I'll implement the new way of updating the hash tomorrow, then all problems should be solved.

@beckermr
Copy link
Member

@wolfv You are describing what the bot already does. The difficulty here is know which sha256 to update as a function of the other variant variables like the target platform. It is this edge case that takes most of the work in the bot version update code.

@beckermr
Copy link
Member

Ack nvm @wolfv. I see what you are saying. That approach is slower but should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants