Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port securesystemslib.hash module #2815

Merged
merged 5 commits into from
Mar 19, 2025

Conversation

lukpueh
Copy link
Member

@lukpueh lukpueh commented Mar 18, 2025

securesystemslib.hash is a small wrapper around hashlib, which serves two main purposes:

  • provide helper function to hash a file
  • translate custom hash algorithm name "blake2b-256" to "blake2b" with (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports above behavior to tuf and uses the builtin hashlib directly where possible.

related secure-systems-lab/securesystemslib#943

@lukpueh lukpueh requested a review from a team as a code owner March 18, 2025 13:59
@lukpueh lukpueh requested review from jku and removed request for jku March 18, 2025 14:00
@lukpueh lukpueh marked this pull request as draft March 18, 2025 14:09
@@ -45,6 +48,31 @@
T = TypeVar("T", "Root", "Timestamp", "Snapshot", "Targets")


def _hash(algo: str) -> Any: # noqa: ANN401
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's hashlib does not make the Hash type public, hence I'm using Any and ask the linter for forgiveness.

@@ -80,6 +80,8 @@

SPEC_VER = ".".join(SPECIFICATION_VERSION)

_DEFAULT_HASH_ALGORITHM = "sha256"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's okay to make this a local default, and not re-use the one newly added to _payload.py, so that we can keep the latter internal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'd be fine with just a magic value as well: it's not really even a default, it's just what this repository decided to use

securesystemslib.hash is a small wrapper around hashlib, which serves
two main purposes:
* provide helper function to hash a file
* translate custom hash algorithm name "blake2b-256" to "blake2b" with
  (digest_size=32).

In preparation for the removal of securesystemslib.hash, this patch ports
above behavior to tuf and uses the builtin hashlib directly where
possible.

related secure-systems-lab/securesystemslib#943

Signed-off-by: Lukas Puehringer <[email protected]>
"""Returns hashed file."""
f.seek(0)
if sys.version_info >= (3, 11):
digest = hashlib.file_digest(f, lambda: _hash(algo)) # type: ignore[arg-type]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy does not allow IO[bytes] for f here. This may be related to python/typing#659. Ignore seems fair.

@lukpueh lukpueh marked this pull request as ready for review March 18, 2025 15:42
Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me. Left some comments

Comment on lines 763 to 767
digest_object = _hash(algorithm)
digest_object.update(data)
else:
digest_object = sslib_hash.digest_fileobject(
data, algorithm
)
except (
sslib_exceptions.UnsupportedAlgorithmError,
sslib_exceptions.FormatError,
) as e:
digest_object = _file_hash(data, algorithm)
except (ValueError, TypeError) as e:
Copy link
Member

@jku jku Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the different functionality is a little annoying here (you need to call update on one digest_object but not the other)...

  • If the _hash and _file_hash functions directly returned the hexdigest this code would be simpler and the functions would be properly annotated
  • on the other hand then _hash() would need a data argument (meaning you would have to split the lambda needed by hashlib.file_digest() into another function)

I don't really know what is the simplest... up to you

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 535a189 look better? I don't have very strong feelings about either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No passionate argument here either. I think I like this new version better.

@@ -80,6 +80,8 @@

SPEC_VER = ".".join(SPECIFICATION_VERSION)

_DEFAULT_HASH_ALGORITHM = "sha256"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'd be fine with just a magic value as well: it's not really even a default, it's just what this repository decided to use

@coveralls
Copy link

coveralls commented Mar 18, 2025

Coverage Status

coverage: 97.064% (-0.01%) from 97.076%
when pulling 75e83b3 on lukpueh:port-sslib-hash
into 9f873cb on theupdateframework:develop.

@jku
Copy link
Member

jku commented Mar 19, 2025

coverage: 96.981% (-0.1%) from 97.076%

coveralls.io is down again so I'm not sure... but I'm guessing this is the "blake2b-256" special case -- otherwise we're probably exercising this code.

lukpueh added 2 commits March 19, 2025 09:28
Remove the "default" prefix, because it's not a default but rather a
fixed value.

Signed-off-by: Lukas Puehringer <[email protected]>
Consolidate interface of bytes hash and file hash helpers.

Signed-off-by: Lukas Puehringer <[email protected]>
@lukpueh
Copy link
Member Author

lukpueh commented Mar 19, 2025

coverage: 96.981% (-0.1%) from 97.076%

coveralls.io is down again so I'm not sure... but I'm guessing this is the "blake2b-256" special case -- otherwise we're probably exercising this code.

Yep, blake is the culprit. Let me add a test.

Signed-off-by: Lukas Puehringer <[email protected]>
@lukpueh
Copy link
Member Author

lukpueh commented Mar 19, 2025

@jku, want to re-approve after my latest changes?

Copy link
Member

@jku jku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The only nit I have left is that multiple docstrings still refer to "securesystemslib default hash algorithm".

Default hash sha256 is now defined locally.

Signed-off-by: Lukas Puehringer <[email protected]>
@lukpueh
Copy link
Member Author

lukpueh commented Mar 19, 2025

Looks good. The only nit I have left is that multiple docstrings still refer to "securesystemslib default hash algorithm".

Good eyes! I pushed a fix and will merge when ci returns.

@lukpueh lukpueh merged commit 500e8b9 into theupdateframework:develop Mar 19, 2025
17 checks passed
lukpueh added a commit to lukpueh/securesystemslib that referenced this pull request Mar 19, 2025
fixes secure-systems-lab#943

* Internal use does not need the additional features (custom blake
  algorithm name support and file hashing), and was replaced by direct
  calls to hashlib.

* External users were updated to no longer require
  `securesystemslib.hash` (theupdateframework/python-tuf#2815,
  in-toto/in-toto#861)

Signed-off-by: Lukas Puehringer <[email protected]>
lukpueh added a commit to lukpueh/securesystemslib that referenced this pull request Mar 19, 2025
fixes secure-systems-lab#943

* Internally, the features added by the module (i.e. custom blake name
  support, and file hashing) are not needed. Builtin hashlib is used now
  directly.

* External users port the relevant features into their code base:
  theupdateframework/python-tuf#2815
  in-toto/in-toto#861

Signed-off-by: Lukas Puehringer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants