-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glob.translate incorrectly matches path separator in character ranges #130942
Comments
Hi, I am the author of the original find, so let me add a bit of context and rationale. The reason I consider this a bug in $ mkdir a
$ touch a/c
$ shopt -s failglob
$ ls -1 a/c
a/c
$ ls -1 a[/]c
-bash: no match: a[/]c
$ ls -1 a[%-0]c
-bash: no match: a[%-0]c
$ python -c 'import glob; print(glob.glob("a/c"))'
[ 'a/c' ]
$ python -c 'import glob; print(glob.glob("a[/]c"))'
[]
$ python -c 'import glob; print(glob.glob("a[%-0]c"))'
[] From
|
Hello. I can make a PR for this issue. |
The relevant manpage paragraph for this issue is (emphasis mine)
I think the issue stems from the fact that glob.translate is based on fnmatch.translate for which separators have no special meaning. However I am on mobile and it's hard to debug this. cc @barneygale |
Now, before opening a PR, I want to be sure that the pattern translated by glob and the pattern used by glob.glob are equivalent in the sense that:
namely regex + glob.translate is the slow alternative of glob.glob. |
@picnixz Thank you for pulling up that manpage, globbing is documented in several of them, that one is in fact most explicitly relevant here.
Created #130985
That would be my expectation as well. In fact, I found this issue by trying to use Python's |
Sorry for having misread the second issue and make you open a separate issue but I don't think there is an issue with [][!] (let's comtinue the discussion in the other issue) |
Once I verify that glob.glob(p) and re.match(glob.translate(s), p) have the same output given the issue's testcase, would it be okay for me to proceed with the PR? |
Probably, but I also want @barneygale's input on this beforehand as I think he's the one who added glob.translate() in 3.13 (but I may be wrong). I may be wrong in my assumptions as I don't remember exactly the specs here (though my gut feeling is telling me that there's an issue as we don't align with the glob manpage). |
This is a legitimate bug in |
@dmitya26 go ahead with a PR :') |
I will, I'm adding the unit test for it and will open one after I finish with that in a few minutes. |
No rush needed! I just wanted to confirm that your PR will be welcome! |
Thanks!! I'm currently working on the fix, but am done implementing the testcase, so can I just open the PR against my fork and then @ you when I'm done implementing the fix? |
Yes, just create a draft PR until you're good for reviews and then, once everything is fine, mark it as ready for review and you can @ me and Barney (I am travelling so I might take a bit of time to reply) Don't forget to read the devguide and add a blurb entry (NEWS entry). |
How should a path separator literal be handled? I am not sure how to interpret "are left unchanged" in:
I see couple options:
In my test directory:
|
That sounds like a good idea. I think I'm going to proceed with this approach if you don't think we need to consult anybody else about it. Thanks! :) |
@dmitya26 I edited my comment because I was actually incorrect. Negative lookahead can be used for ranges including path separator (remember to check |
Oh, I may have found an actual bug in $ touch 'a[y-x]b'
$ ls -1 a[y-x]b
'a[y-x]b'
$ python -c 'import glob; print(glob.glob("a[y-x]b"))'
[] In this case, @picnixz @barneygale I cannot find any man page nor Google anything that specifies behavior of glob backwards ranges. Could you please quickly confirm this before I open a separate issue? |
glob.translare() is based on fnmatch.translate() which itself removes empty ranges IIRC. Maybe we should escape them instead? I don't know, as this is the first time someone found that inconsistency I think =/ |
Could it also be something that is ls-specific and not just bash specific? |
|
I guess this indeed counts as a separate issue. I think we need to amend how fnmatch.translate() handles empty ranges but maybe we should just inline the call to the internal helper of fnmatch.translate() in glob.translate as it becomes more and more glob-specific (note that fnmatch(3) and Python's fnmatch differ as the former support special classes like What's the recommended plan of action @barneygale? |
Okay, another false alarm. Python is correct here, issue is with bash's stupid default handing of unmatched globs, passing them to commands as literals for them to match with exact filename. $ touch 'a[y-x]'
$ touch a
$ ls -1 a[y-x]b # No match, equivalent to `ls -1 'a[y-x]b'`
'a[y-x]b'
$ shopt -s failglob
$ ls -1 a[y-x]b
-bash: no match: a[y-x]b
$ shopt -u failglob
$ shopt -s nullglob
$ ls -1 a[y-x]b # No match, equivalent to `ls -1`
a
'a[y-x]b' This behavior can be seen in a simpler case: $ touch '[x]'
$ touch y
$ ls -1 [x] # No match, equivalent to `ls -1 '[x]'`
'[x]'
$ shopt -s failglob
$ ls -1 [x]
-bash: no match: [x]
$ shopt -u failglob
$ shopt -s nullglob
$ ls -1 [x] # No match, equivalent to `ls -1`
'[x]'
y |
Bug report
Bug description:
Description
The
glob.translate
function in Python 3.13 doesn't properly handle path separators when they're included in character ranges. According to the documentation, "wildcards do not match path separators," but this isn't enforced for character ranges that include the path separator.Expected behavior
Character ranges in glob patterns should not match path separators, consistent with the behavior of single character wildcards (
?
) and the documented behavior of the module.Actual behavior
Character ranges that include the path separator (e.g.,
[%-0]
which includes the/
character) will match the path separator, contradicting the documented behavior.Code to reproduce
Link to original find: https://stackoverflow.com/questions/79492274/glob-translate-incorrectly-matches-path-separator-when-using-ranges
Linked PRs
The text was updated successfully, but these errors were encountered: