-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pip cache remove
pattern matching.
#13094
base: main
Are you sure you want to change the base?
Changes from all commits
c65d645
f2c5bcf
e6c2214
a2207f2
eeb47c8
90d10f0
3a3080c
c1766ac
d58a510
d2681bc
38afd1c
96a5c97
04ee6f1
6929c9c
0b7e86b
531e606
95e13e4
8a40962
f4f17f6
c6129a0
dae4c29
9ab8d8c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix ``pip cache remove`` pattern matching. | ||
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,4 +1,6 @@ | ||||||||||||||||
import fnmatch | ||||||||||||||||
import os | ||||||||||||||||
import re | ||||||||||||||||
import textwrap | ||||||||||||||||
from optparse import Values | ||||||||||||||||
from typing import Any, List | ||||||||||||||||
|
@@ -207,22 +209,55 @@ def _find_http_files(self, options: Values) -> List[str]: | |||||||||||||||
def _find_wheels(self, options: Values, pattern: str) -> List[str]: | ||||||||||||||||
wheel_dir = self._cache_dir(options, "wheels") | ||||||||||||||||
|
||||||||||||||||
# The wheel filename format, as specified in PEP 427, is: | ||||||||||||||||
# The wheel filename format, as originally specified in PEP 427, is: | ||||||||||||||||
# {distribution}-{version}(-{build})?-{python}-{abi}-{platform}.whl | ||||||||||||||||
# | ||||||||||||||||
# Additionally, non-alphanumeric values in the distribution are | ||||||||||||||||
# normalized to underscores (_), meaning hyphens can never occur | ||||||||||||||||
# before `-{version}`. | ||||||||||||||||
Comment on lines
215
to
217
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately, due to bugs in build backends (like pypa/setuptools#3777), this isn't guaranteed. Dashes are always normalized (as the wheel filename would be invalid otherwise) but the name may contain periods. For example, Normalization of the wheel filenames is going to be necessary to make this more robust. That can be saved for later, of course. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you realize this, but this is pre-existing commentary. As Pex maintainer I've become intimately familiar with packaging history and reality. |
||||||||||||||||
# | ||||||||||||||||
# Given that information: | ||||||||||||||||
# - If the pattern we're given contains a hyphen (-), the user is | ||||||||||||||||
# providing at least the version. Thus, we can just append `*.whl` | ||||||||||||||||
# to match the rest of it. | ||||||||||||||||
# - If the pattern we're given doesn't contain a hyphen (-), the | ||||||||||||||||
# user is only providing the name. Thus, we append `-*.whl` to | ||||||||||||||||
# match the hyphen before the version, followed by anything else. | ||||||||||||||||
# - If the pattern we're given is not a glob: | ||||||||||||||||
# + We attempt project name normalization such that pyyaml matches PyYAML, | ||||||||||||||||
# etc. as outlined here: | ||||||||||||||||
# https://packaging.python.org/specifications/name-normalization. | ||||||||||||||||
# The one difference is we normalize non-alphanumeric to `_` instead of `-` | ||||||||||||||||
# since that is how the wheel filename components are normalized | ||||||||||||||||
# (facepalm). | ||||||||||||||||
Comment on lines
+221
to
+226
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC this isn't quite accurate? The match is always case insensitive as we're using |
||||||||||||||||
# + If the pattern we're given contains a hyphen (-), the user is providing | ||||||||||||||||
# at least the version. Thus, we can just append `*.whl` to match the rest | ||||||||||||||||
# of it if it doesn't already glob to the end of the wheel file name. | ||||||||||||||||
Comment on lines
+227
to
+229
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I realize that this comment is pre-existing, but it's also too dense to be readable.
Comment on lines
+227
to
+229
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately, this is not always true. Footnotes
|
||||||||||||||||
# + If the pattern we're given is not a glob and doesn't contain a | ||||||||||||||||
# hyphen (-), the user is only providing the name. Thus, we append `-*.whl` | ||||||||||||||||
# to match the hyphen before the version, followed by anything else. | ||||||||||||||||
Comment on lines
+230
to
+232
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This has already been implied by the preceding comments. |
||||||||||||||||
# - If the pattern is a glob, we ensure `*.whl` is appended if needed but | ||||||||||||||||
# cowardly do not attempt glob parsing / project name normalization. The user | ||||||||||||||||
# is on their own at that point. | ||||||||||||||||
# | ||||||||||||||||
# PEP 427: https://www.python.org/dev/peps/pep-0427/ | ||||||||||||||||
Comment on lines
236
to
237
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If we're linking to the canonical specification, the reference to the original PEP isn't needed anymore. |
||||||||||||||||
pattern = pattern + ("*.whl" if "-" in pattern else "-*.whl") | ||||||||||||||||
|
||||||||||||||||
return filesystem.find_files(wheel_dir, pattern) | ||||||||||||||||
# And: https://packaging.python.org/specifications/binary-distribution-format/ | ||||||||||||||||
Comment on lines
237
to
+238
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For Pip / PyPA knowledgeable reviewers - what the heck is up with this by the way? I've always found it off-putting since the PEP banners have started appearing on some of the PyPA PEPs re-directing to these new standards pages. The fact that some PEPs get a new page and some don't and what process the ones with the new page go through to get updated is all mysterious feeling. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We’ve been pushing people toward the standard pages on packaging.python.org since those are the up-to-date standard documents, while PEP pages are no longer updated once they are done, which can be a problem when you want to track done how to do a thing that’s been revised a couple of times. I don’t have knowledge how the auto redirection is actually implemented though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The procedure for modern packaging PEPs is that once accepted, the specification is transferred to the PUG (Packaging User Guide, which is the name for
I'd imagine that this doesn't happen for all packaging PEPs though... (due to forgetting or not realizing this is a requirement). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, yeah, I was hoping for a better reason. I personally don't trust the non-PEP pages and never read them. I read the PEPs and their backlinks from follow-on PEPs to know the actually agreed on thing and the history. I need to be sure of both as a packing ecosystem maintainer that maintains a tool that never breaks backward compatibility (knowingly), including never dropping old Python version support. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's entirely possible according to the process for the canonical specification (on packaging.python.org) to be updated without needing a PEP. So using only the PEPs actually risks not following the current standards. In practice, it's quite rare that this happens (it requires an "uncontroversial" change, and we all know how likely that is 🙄) The canonical spec should include a changelog (see here for an example, which also includes a change that wasn't in a PEP) but as with everything else, it relies on people maintaining it correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a packaging ecosystem maintainer I find that horrifying. I'd much rather have a chain of PEPs than that chain + maybe a summary that may also have new content on top and may or may not have a changelog fully filled out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're interpreting it wrongly. The spec documents are definitive, and PEPs are proposals to change that set of documents. Just like the relationship between the Python source and PEPs that modify it. The problem with treating PEPs as definitive is that they are not allowed to change, so you end up having to read a series of documents, applying them as a series of diffs, to understand the actual spec. Criticising the process because people make mistakes seems unreasonable to me (and honestly, I don't think mistakes are frequent - most discrepancies between the defined process and reality are cases of PEPs that were written before the current process was defined, and no-one has volunteered yet to write the necessary historical documents) |
||||||||||||||||
uses_glob_patterns = any( | ||||||||||||||||
glob_pattern_char in pattern for glob_pattern_char in ("[", "*", "?") | ||||||||||||||||
) | ||||||||||||||||
Comment on lines
+239
to
+241
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
if not uses_glob_patterns: | ||||||||||||||||
project_name, sep, rest = pattern.partition("-") | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. N.B. I documented punting on deciphering glob patterns with respect to name normalization. Someone braver than I might try that later. I assume forward progress here is better than no progress though. I did not, however, call the shot on version normalization. Again, I was cowardly, but that problem still remains. If a wheel in the cache is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Indeed!
Hmm, how does this limitation come into play? The pattern is matched with case sensitivity disabled so that's not a problem.
The proper way to handle this would be performing a fully normalized version match. Extract the version from the pattern, convert it to |
||||||||||||||||
# N.B.: This only normalizes non-alphanumeric characters in the name, which | ||||||||||||||||
# is 1/2 the job. Stripping case sensitivity is the other 1/2 and is | ||||||||||||||||
# handled below in the conversion from a glob to a regex executed with the | ||||||||||||||||
# IGNORECASE flag active. | ||||||||||||||||
Comment on lines
+244
to
+247
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
As someone who likes to write wordy comments, I totally get where you're coming from, but being concise is a good thing :) |
||||||||||||||||
partially_normalized_project_name = re.sub(r"[^\w.]+", "_", project_name) | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
It's probably safer to simply reuse the pattern from the specification. |
||||||||||||||||
if not sep: | ||||||||||||||||
sep = "-" | ||||||||||||||||
pattern = f"{partially_normalized_project_name}{sep}{rest}" | ||||||||||||||||
|
||||||||||||||||
if not pattern.endswith((".whl", "*")): | ||||||||||||||||
pattern = f"{pattern}*.whl" | ||||||||||||||||
|
||||||||||||||||
regex = fnmatch.translate(pattern) | ||||||||||||||||
|
||||||||||||||||
return [ | ||||||||||||||||
os.path.join(root, filename) | ||||||||||||||||
for root, _, files in os.walk(wheel_dir) | ||||||||||||||||
for filename in files | ||||||||||||||||
if re.match(regex, filename, flags=re.IGNORECASE) | ||||||||||||||||
] | ||||||||||||||||
Comment on lines
+256
to
+263
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer that |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,23 +68,26 @@ def populate_http_cache(http_cache_dir: str) -> List[Tuple[str, str]]: | |
return files | ||
|
||
|
||
@pytest.fixture | ||
def populate_wheel_cache(wheel_cache_dir: str) -> List[Tuple[str, str]]: | ||
def _populate_wheel_cache( | ||
wheel_cache_dir: str, *name_and_versions: str | ||
) -> List[Tuple[str, str]]: | ||
jsirois marked this conversation as resolved.
Show resolved
Hide resolved
|
||
destination = os.path.join(wheel_cache_dir, "arbitrary", "pathname") | ||
os.makedirs(destination) | ||
|
||
files = [ | ||
("yyy-1.2.3", os.path.join(destination, "yyy-1.2.3-py3-none-any.whl")), | ||
("zzz-4.5.6", os.path.join(destination, "zzz-4.5.6-py3-none-any.whl")), | ||
("zzz-4.5.7", os.path.join(destination, "zzz-4.5.7-py3-none-any.whl")), | ||
("zzz-7.8.9", os.path.join(destination, "zzz-7.8.9-py3-none-any.whl")), | ||
] | ||
|
||
for _name, filename in files: | ||
wheel_info = [] | ||
for name_and_version in name_and_versions: | ||
filename = os.path.join(destination, f"{name_and_version}-py3-none-any.whl") | ||
with open(filename, "w"): | ||
pass | ||
wheel_info.append((name_and_version, filename)) | ||
return wheel_info | ||
|
||
return files | ||
|
||
@pytest.fixture | ||
def populate_wheel_cache(wheel_cache_dir: str) -> List[Tuple[str, str]]: | ||
return _populate_wheel_cache( | ||
wheel_cache_dir, "yyy-1.2.3", "zzz-4.5.6", "zzz-4.5.7", "zzz-7.8.9" | ||
) | ||
|
||
|
||
@pytest.fixture | ||
|
@@ -332,34 +335,97 @@ def test_cache_remove_too_many_args(script: PipTestEnvironment) -> None: | |
script.pip("cache", "remove", "aaa", "bbb", expect_error=True) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"pattern", | ||
[ | ||
"zzz", | ||
"ZZZ", | ||
"zZz-*-py3-none-any.whl", | ||
"zZz-*.whl", | ||
"zzz-*", | ||
"zzz*", | ||
"zz[a-zA-Z]?", | ||
"[!y]", | ||
], | ||
) | ||
@pytest.mark.usefixtures("populate_wheel_cache") | ||
def test_cache_remove_name_match( | ||
script: PipTestEnvironment, remove_matches_wheel: RemoveMatches | ||
script: PipTestEnvironment, pattern: str, remove_matches_wheel: RemoveMatches | ||
) -> None: | ||
"""Running `pip cache remove zzz` should remove zzz-4.5.6 and zzz-7.8.9, | ||
but nothing else.""" | ||
result = script.pip("cache", "remove", "zzz", "--verbose") | ||
"""Running `pip cache remove <zzz matching pattern>` should remove zzz-4.5.6, | ||
zzz-4.5.7 and zzz-7.8.9, but nothing else.""" | ||
result = script.pip("cache", "remove", pattern, "--verbose") | ||
|
||
assert not remove_matches_wheel("yyy-1.2.3", result) | ||
assert remove_matches_wheel("zzz-4.5.6", result) | ||
assert remove_matches_wheel("zzz-4.5.7", result) | ||
assert remove_matches_wheel("zzz-7.8.9", result) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"pattern", | ||
[ | ||
"zzz-4.5.6", | ||
"ZZZ-4.5.6", | ||
"zZz-4.5.6-py3-none-any.whl", | ||
"zZz-4.5.6-*.whl", | ||
"zZz-4.5.[0-6]", | ||
], | ||
) | ||
@pytest.mark.usefixtures("populate_wheel_cache") | ||
def test_cache_remove_name_and_version_match( | ||
script: PipTestEnvironment, remove_matches_wheel: RemoveMatches | ||
script: PipTestEnvironment, pattern: str, remove_matches_wheel: RemoveMatches | ||
) -> None: | ||
"""Running `pip cache remove zzz-4.5.6` should remove zzz-4.5.6, but | ||
nothing else.""" | ||
result = script.pip("cache", "remove", "zzz-4.5.6", "--verbose") | ||
"""Running `pip cache remove <zzz-4.5.6 matching pattern>` should remove zzz-4.5.6, | ||
but nothing else.""" | ||
result = script.pip("cache", "remove", pattern, "--verbose") | ||
|
||
assert not remove_matches_wheel("yyy-1.2.3", result) | ||
assert remove_matches_wheel("zzz-4.5.6", result) | ||
assert not remove_matches_wheel("zzz-4.5.7", result) | ||
assert not remove_matches_wheel("zzz-7.8.9", result) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you also include some test cases with periods and runs of underscores? The latter is not something I'd expect an user to pass to pip, but it'd be good to validate this is following the specifications. |
||
"pattern", | ||
[ | ||
"pyfftw", | ||
"pyfftw-0.15", | ||
"pyfftw-0.15.0", | ||
"pyFFTW", | ||
"pyFFTW-0.15", | ||
"pyFFTW-0.15.0", | ||
], | ||
) | ||
def test_issue_13086_op_case( | ||
script: PipTestEnvironment, | ||
wheel_cache_dir: str, | ||
pattern: str, | ||
remove_matches_wheel: RemoveMatches, | ||
) -> None: | ||
_populate_wheel_cache(wheel_cache_dir, "pyFFTW-0.15.0", "foo-1.0") | ||
result = script.pip("cache", "remove", pattern, "--verbose") | ||
|
||
assert remove_matches_wheel("pyFFTW-0.15.0", result) | ||
assert not remove_matches_wheel("foo-1.0", result) | ||
|
||
|
||
def test_issue_13086_glob_case( | ||
script: PipTestEnvironment, | ||
wheel_cache_dir: str, | ||
remove_matches_wheel: RemoveMatches, | ||
) -> None: | ||
_populate_wheel_cache( | ||
wheel_cache_dir, "foo0-1.0", "foo1-1.0", "foo2-1.0", "foo1bob-1.0" | ||
) | ||
result = script.pip("cache", "remove", "foo[0-2]", "--verbose") | ||
|
||
assert remove_matches_wheel("foo0-1.0", result) | ||
assert remove_matches_wheel("foo1-1.0", result) | ||
assert remove_matches_wheel("foo2-1.0", result) | ||
assert not remove_matches_wheel("foo1bob", result) | ||
|
||
|
||
@pytest.mark.usefixtures("populate_http_cache", "populate_wheel_cache") | ||
def test_cache_purge( | ||
script: PipTestEnvironment, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As evident by all of the edge cases, this is moreso a step in the right direction than a total fix. It's a good step forward and certainly needed, but I don't want to claim everything will work w/o issues now :)
ALSO, this PR will affect
pip cache list
too. That's a good thing, we may as well mention it.