add source linting for recipe v2 #2028

wolfv · 2024-08-08T15:23:53Z

cc @beckermr - I added a lint to make sure people don't use sha256 as template.

Also I think we don't support sha1 at this point. Need to double check :)

isuruf · 2024-08-08T15:46:04Z

conda_smithy/lint_recipe.py

+    if recipe_version == 1:
+        lint_sources_should_have_hash(sources_section, lints)
+    elif recipe_version == 2:
+        conda_recipe_v2_linter.lint_sources(sources_section, lints)


There's no need to have two functions. Please use one.

Do you want to lint v1 recipes to make sure that the SHA hash is not templated as well @isuruf?
I think that would be a good change, but some churn involved.

You can pass recipe_version to lint_sources_should_have_hash and make the distinction there. Btw, how do you write a recipe like https://github.com/conda-forge/cuda-nvcc-impl-feedstock/blob/main/recipe/meta.yaml#L24-L28?

You can write it like this (valid yaml!):

context: version: "12.6.20" source: - if: target_platform == "linux-64: then: url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-{{ version }}-archive.tar.xz sha256: 4336a230269db14336ca1f727a0502631390a52d37c17d3bde05ccda5d534406 patches: - nvcc.profile.patch - if: target_platform == "linux-aarch64" then: url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-sbsa/cuda_nvcc-linux-sbsa-{{ version }}-archive.tar.xz sha256: 06112bf1cc0cfb6b881a2e0420cb4611b6d701decbb195fbfb4f0e9966a42006 patches: - nvcc.profile.patch - if: target_platform == "win-64" then: url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/x64/cuda_nvcc-x64-{{ version }}-archive.zip sha256: 03282a36fb98c2f227877a62efd04c36b1ab01fb1b676d9edbd0b10179e0a40a patches: - nvcc.profile.patch.win

biggest downside is the patch duplication for the two sources although I think that's ... OK :)

Since you agree that is an abomination, we should figure out a non-abomination. That's the whole point of v2 IMO. Make recipes not be abominations.

So for context, the lint suggestion arose here

regro/cf-scripts#2920 (comment)

where I commented that you'd need to update the hash in the context section too. The suggestion was that by eliminating the use of variables to specify the hash, we could simplify the code in the bot. That entire block of code is 24 lines in the bot (https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/update_recipe/version.py#L173-L197).

Then we had another discussion here

https://github.com/regro/cf-scripts/pull/2920/files/b7f61366a355ab41e0728e7cc325ed36596c3d67#discussion_r1709740255

where I pointed out that we have to evaluate the context section of the recipe under the assumptions of not just linux, but also other platforms in order to update things correctly. You suggested that this lint would help and we both agreed there would be other edge cases this would miss.

So we can have a separate discussion on this change on its own, but the context for it came from simplifying bot code by eliminating flexibility in the v2 recipe spec.

This is the motivation for my comments. This PR is an incomplete solution to solve complexity in the v2 recipe spec. We either need to reconsider the spec to eliminate complexity or we need to deal with it. Excluding the a small part of the complexity only in conda-forge is an incomplete solution. Given that many conda-forge devs helped make the spec, it's rather odd that we wouldn't use all of it.

As for the bot and the implementation, I don't see the point in avoiding the work to support the new spec in the same way. Otherwise, we do something that covers only part of the cases, people find bugs + submit bug reports, and then someone (you, me, someone else?) in the future is left either saying this won't ever happen or fixing it.

I suggested a better path forward for this work in multiple places related to eliminating technical debt around recipe parsing+editing (which is one of the original motivations for the new spec anyways) and then using that to move things ahead.

Hmm, I think my motivation was never to make it less verbose to write complex recipes :)

My motivation was to make recipes structured, easy to parse and fast to evaluate (ie. not do the recursive craziness of conda-build).

I do think we could find solutions to this problem, but for me it's low priority as we can continue using meta.yaml, so if a user doesn't like the new format / additional verbosity there is no need to switch. From my POV it's also very few recipes that are structured this way (mostly binary repacks). And I also don't mind the more extended form, personally.

But if you have good ideas for improvements to the spec, I would love to hear them!

Thanks for adding the context, Matt. I am happy to fix bugs/issues as they come up related to the new format, of course!

We're talking past one another a bit here, but this statement is a red flag:

I do think we could find solutions to this problem, but for me it's low priority as we can continue using meta.yaml, so if a user doesn't like the new format / additional verbosity there is no need to switch. From my POV it's also very few recipes that are structured this way (mostly binary repacks). And I also don't mind the more extended form, personally.

The goal here, I thought, is to move to a new spec and eliminate a previous one, all for tangible benefits (performance, interpretability, maintainability, etc.). If we are not in this process to actually deprecate and then remove the v1 spec, why on earth would we do all of this work just to support both in the end?

Thanks for adding the context, Matt. I am happy to fix bugs/issues as they come up related to the new format, of course!

Great and thank you! I have raised several issues around edge cases in the PRs we have going for v2 and my opinion is that now is a good time to address them.

If the outcome is that v2 replaces v1 I will be personally very happy. But I also don't want to force people on v2, especially not during the initial period. The whole transition will probably take at least a year, and the idea is that it's opt-in (if you have a recipe.yaml then new format is used, otherwise fall back to meta.yaml). I hope we're on the same page there :)

beckermr

I'm not against this in principle but I'd like to understand how many existing recipes we might impact should they be ported to the v2 format. Also this does not solve the issue we are discussing for the bot, so should be considered separate.

wolfv · 2024-08-08T16:33:31Z

@beckermr 5 pages: https://github.com/search?q=org%3Aconda-forge+%22%7B%25+set+sha256%22&type=code&p=1 :)

But I want to reiterate that we are planning a very gradual migration. I think the cost of copy-pasting the SHA hash into the right spot is negligible.

wolfv · 2024-08-08T16:34:08Z

We can also ask @schuylermartin45 if we can add this (as a flag) to the recipe converter :)

schuylermartin45 · 2024-08-08T20:51:46Z

I think I've said this before, but I call them V0 and V1 recipes. The original recipe format doesn't have a schema_version field and the new format has schema_version set to 1. I'm sure a lot of people will get confused if V2 implies a value of 1.

</rant>

Aaaaaaaanyways I'm trying to catch up here and I'll add some notes to the conversations I've seen:

In the vast, vast majority of recipes I've seen in the Conda Recipe Manager test data set, sha256 is used over any other hashing algorithm
I know we have historically had some package builders who prefer to set a JINJA hash or sha256 variable at the top of their V0 files. I don't have an exact figure, but a quick search shows about 400 Anaconda Recipes fall into this camp.
I also know a handful replace the key sha256: with {{ hash_type }}: and I already correct for that in the preprocessing phase of recipe conversion.

I don't think I'm 100% clear on what this flag needs to do. Personally I'd rather just have the conversion process do what is "the most right" automatically for all recipes and try not to have special exceptions/cases.

I'm curious if this project would like to start using Conda Recipe Manager (CRM) for recipe parsing/editing. That seems like something that might be useful to integrate with a linter. CRM originated and got split out of our Anaconda Linter project so others could use it.

wolfv · 2024-08-08T21:48:46Z

@schuylermartin45 the transformation that I would like to see is to just insert sha256 hashes unconditionally instead of keeping them as jinja vars.

For example:

{% set sha256 = "foo" %}

source:
  url: https://foo.com
  sha256: {{ sha256 }}

Can be - pretty much always - reduced to

source:
  url: https://foo.com
  sha256: foo

wolfv · 2024-08-09T14:19:55Z

I rebased this PR on top of #2029 which should go in first.

beckermr · 2024-08-09T14:30:41Z

I am -1 to merging this PR in general.

wolfv · 2024-08-09T14:35:14Z

@beckermr this is what templated SHA hashes look like in an editor right now.

So far, no one has presented a good use case.

We can lift this limitation at a later point in time as well.

I don't really understand the push-back 🤷

beckermr · 2024-08-09T14:40:04Z

I disagree given the discussion above.

wolfv · 2024-08-09T14:46:35Z

The outcome of the discussion is:

We should brainstorm ideas on how to prevent the duplication there.

While we don't have a solution we might as well add a lint.

I really, really don't see a good use case for templated hashes, and no one has presented one so far. But anyways, not a hill I want to die on either :)

schuylermartin45 · 2024-08-09T15:06:15Z

I really, really don't see a good use case for templated hashes, and no one has presented one so far. But anyways, not a hill I want to die on either :)

I was going to say this issue felt like a "proving a negative" situation. Yes it is a little silly to set such a variable in most cases, BUT I just found a real-world example where there is a reason for it:

https://github.com/AnacondaRecipes/gettext-feedstock/blob/master/recipe/meta.yaml#L28

I'm not saying there aren't other ways to handle this scenario, but we have at least one recipe in the world that sets a sha256 and uses it more than one place.

If I had to guess, there might also be some project out there hosted on some platform that puts the SHA in the URL and that changes it per version artifact.

wolfv · 2024-08-09T15:18:56Z

@schuylermartin45 thanks for finding a use case!

With a working version-updater bot, it wouldn't really matter to have the same hash twice (as string), since the bot is going to update it in both places nicely anyways. So that would be one "workaround".

If I had to guess, there might also be some project out there hosted on some platform that puts the SHA in the URL and that changes it per version artifact.

In that case, the version update bot is lost anyways since it would have to guess a random string. It might still potentially work if it comes from e.g. github releases or so but yeah.

beckermr · 2024-08-09T15:33:38Z

I suggest folks submit a CEP to clarify the spec if they so desire.

wolfv · 2024-08-09T15:37:17Z

What does this have to do with the spec? :D

beckermr · 2024-08-09T15:40:08Z

The spec allows what is in some folks opinions a bad practice that conda-forge should lint on, effectively removing part of the spec.

It seems to me that we should change the spec if we feel strongly enough to propose a lint removing part of the spec from all of conda-forge.

wolfv · 2024-08-09T15:41:54Z

Hmm... that doesn't make sense to me. Like I can write really ugly Python code, but a linter is supposed to lint it. It's still valid Python.

This is a linter, not a spec enforcer...

beckermr · 2024-08-09T15:51:25Z

OK @wolfv. I was thinking in the spirit of how we remove jinja2 if and for statements from the spec, in part to remove things that were arguably not the best practices, that this would be the same.

wolfv · 2024-09-18T17:51:35Z

@isuruf @beckermr here is another way to have different hashes per platform:

source:
  url: https://foo.com/sha-0.1.0-${{ target_platform }}.tar.gz
  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}
  patches: 
   - patch1.patch

I think that's a reasonably OK way to write this.
However, currently it would still break the version bumper that I coded. Of course, it can be fixed. It's just a little annoying and I would like to defer that once we validated the rest of the pipeline to work for the 95% case of packages (e.g. simple noarch Python packages and all the rest).

beckermr · 2024-09-18T20:00:02Z

Yeah the whole point of the new format is to not hide information in areas that are hard to parse. Putting things in if...else blocks in selectors is a bad idea.

wolfv · 2024-09-18T20:04:38Z

@beckermr does that mean that you like this last suggestion or not? Feel free to tell me what you want these things to look like.

beckermr · 2024-09-18T20:15:17Z

Right, I do not like the if...else in the templates eval expression. My suggestion is that we allow people to use templates in the hash section and work on the verbosity through some other mechanism.

wolfv · 2024-09-18T20:18:46Z

@beckermr are you referring to my last example where I use if/else inside the Jinja template?

Do you like option A:

  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}

Or option B:

  - if: target_platform == "linux-64:
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-x86_64/cuda_nvcc-linux-x86_64-${{ version }}-archive.tar.xz
      sha256: 4336a230269db14336ca1f727a0502631390a52d37c17d3bde05ccda5d534406
      patches:
        - nvcc.profile.patch
  - if: target_platform == "linux-aarch64"
    then:
      url: https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvcc/linux-sbsa/cuda_nvcc-linux-sbsa-${{ version }}-archive.tar.xz
      sha256: 06112bf1cc0cfb6b881a2e0420cb4611b6d701decbb195fbfb4f0e9966a42006
      patches:
        - nvcc.profile.patch

beckermr · 2024-09-18T20:20:56Z

I do not like option A. I am ok with option B, but I agree with @isuruf that being able to reduce the verbosity would be helpful/nice.

h-vetinari · 2024-09-18T20:25:04Z

I do not like option A.

I'd prefer option A actually. Aside from the last line, the if <xyz> else almost reads like selectors:

  sha256: |
    ${{ 
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if linux else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if osx else
      "5c59b624269ca790932303bda7a068ee49aa6d29c4c2dd4301bfba9230adfd81" if win
    }}

Of course that'd mean we need to linting across multiple lines, but the ${{ ... }} brackets make this feasible IMO.

isuruf · 2024-09-18T20:25:29Z

Option C:

source:
  url: https://foo.com/sha-0.1.0-${{ target_platform }}.tar.gz
  sha256:
    if: linux
    then: 123
    else:
      if: osx
      then: 456
      else: 789
  patches: 
   - patch1.patch

beckermr · 2024-09-18T20:28:54Z

Option A makes automated updating of recipe versions with hashes really really hard. Selectors with single jinja2 set statements is actually easier. Option C is similarly difficult to update automatically.

The whole point of the new format was to make parsing, updating, and working with recipes easier. The fact that something is or is not valid yaml is necessary, but not sufficient for that to be true.

beckermr · 2024-09-18T20:38:24Z

If you want to go there, a switch statement is probably the thing you are looking for:

sha256:
  switch:
    - "target == blah": foo
    - "target == blarg": blah

That is parseable yaml and enables fairly easy updating.

wolfv · 2024-09-18T20:38:27Z

Yeah, it's difficult but not impossible. We just have to render the recipe once, get the current SHA, update the version, render again, download & grab the new SHA, then find + replace the old SHA.

Option B was what I initially proposed but got backlash on :)

I'll implement the new way of updating the hash tomorrow, then all problems should be solved.

beckermr · 2024-09-18T20:40:45Z

@wolfv You are describing what the bot already does. The difficulty here is know which sha256 to update as a function of the other variant variables like the target platform. It is this edge case that takes most of the work in the bot version update code.

beckermr · 2024-09-18T20:44:02Z

Ack nvm @wolfv. I see what you are saying. That approach is slower but should work.

add source linting for recipe v2

a88c2ed

wolfv requested a review from a team as a code owner August 8, 2024 15:23

wolfv mentioned this pull request Aug 8, 2024

add v2 recipe editing capabilities for bumping version & build number regro/cf-scripts#2920

Open

isuruf reviewed Aug 8, 2024

View reviewed changes

beckermr requested changes Aug 8, 2024

View reviewed changes

wolfv added 7 commits August 9, 2024 08:59

add end-to-end-tests

446e580

remove some print statements

8472902

add test for v2 name, version and context jinja

b7b0f10

remove debugging prints

601e1a1

remove lint & test for now

f1b2ef2

fix invocation

f7d0975

add lint for sha256 / md5 templating for the new recipe format

202e7e5

nichmor mentioned this pull request Aug 19, 2024

feat: allow uploading on main channel conda-forge/conda-forge-ci-setup-feedstock#337

Open

5 tasks

add source linting for recipe v2 #2028

Are you sure you want to change the base?

add source linting for recipe v2 #2028

Conversation

wolfv commented Aug 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beckermr left a comment

Choose a reason for hiding this comment

wolfv commented Aug 8, 2024

wolfv commented Aug 8, 2024

schuylermartin45 commented Aug 8, 2024 • edited Loading

wolfv commented Aug 8, 2024

wolfv commented Aug 9, 2024

beckermr commented Aug 9, 2024

wolfv commented Aug 9, 2024

beckermr commented Aug 9, 2024

wolfv commented Aug 9, 2024

schuylermartin45 commented Aug 9, 2024 • edited Loading

wolfv commented Aug 9, 2024

beckermr commented Aug 9, 2024

wolfv commented Aug 9, 2024

beckermr commented Aug 9, 2024

wolfv commented Aug 9, 2024 • edited Loading

beckermr commented Aug 9, 2024

wolfv commented Sep 18, 2024

beckermr commented Sep 18, 2024

wolfv commented Sep 18, 2024

beckermr commented Sep 18, 2024

wolfv commented Sep 18, 2024

beckermr commented Sep 18, 2024

h-vetinari commented Sep 18, 2024 • edited Loading

isuruf commented Sep 18, 2024

beckermr commented Sep 18, 2024

beckermr commented Sep 18, 2024

wolfv commented Sep 18, 2024

beckermr commented Sep 18, 2024

beckermr commented Sep 18, 2024

schuylermartin45 commented Aug 8, 2024 •

edited

Loading

schuylermartin45 commented Aug 9, 2024 •

edited

Loading

wolfv commented Aug 9, 2024 •

edited

Loading

h-vetinari commented Sep 18, 2024 •

edited

Loading