Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gix-command on Windows runs shell commands in non-POSIX mode #1868

Open
EliahKagan opened this issue Mar 1, 2025 · 3 comments
Open

gix-command on Windows runs shell commands in non-POSIX mode #1868

EliahKagan opened this issue Mar 1, 2025 · 3 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Mar 1, 2025

Current behavior 😯

Background

Git commands that run in a shell are meant to run in a POSIX-compatible sh. This can be, and usually is, a shell that is more specifically known by some other name and that extends and even breaks with the requirements of POSIX for sh. But when run as sh, such a shell behaves in a POSIX-compatible manner. This is true of bash, which provides sh in Git for Windows environments. (Some shells, including bash, enter POSIX mode only after running commands from startup scripts. Some shells also do not behave in a completely POSIX-compatible way even when in POSIX mode. Neither of those caveats relates to this issue.)

Therefore, even when sh and bash are same due to being equivalent symlinks, hard links, or duplicate files, running sh runs a shell in POSIX mode. This is, broadly speaking, the case even on Windows: for example, when bash.exe and sh.exe are shell executables with the same contents, running bash.exe defaults to non-POSIX mode and running sh.exe defaults to POSIX mode.

But this is not quite true when the executable being run is not really the shell itself but instead a shim that runs the shell. Then what matters is the name that the shim runs the shell under. This is not a special rule, but just a consequence of the above: the shim, after all, is a separate program running the shell. Ordinarily, this would not be a problem. Non-shim bash.exe and sh.exe shells--which could be copies, symlinks, or hard links--could be run by separate similar but nonidentical bash.exe and sh.exe shim executables. In this approach, the bash.exe shim would delegate to the non-shim bash.exe, and the sh.exe would delegate to the non-shim sh.exe.

The problem

The trouble is that the (git root)\bin\bash.exe and (git root)\bin\sh.exe shims found in full non-SDK installations of Git for Windows (including portable installations) do not work this way. They are equivalent: both delegate to (git root)\usr\bin\bash.exe, neither to (git root)\usr\bin\sh.exe. At least in the Portable Git installations I tested--and scoop installations, but that is a repackaging of Portable Git--they are separate files, and they are not hard links to the same thing, but they have identical contents:

C:\Users\ek> cd C:\Users\ek\scoop\apps\git\2.48.1\bin
C:\Users\ek\scoop\apps\git\2.48.1\bin> ls

    Directory: C:\Users\ek\scoop\apps\git\2.48.1\bin

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---           2/13/2025  6:13 AM          46992 bash.exe
-a---           2/13/2025  6:13 AM          46472 git.exe
-a---           2/13/2025  6:13 AM          46992 sh.exe

C:\Users\ek\scoop\apps\git\2.48.1\bin> fsutil hardlink list bash.exe
\Users\ek\scoop\apps\git\2.48.1\bin\bash.exe
C:\Users\ek\scoop\apps\git\2.48.1\bin> fsutil hardlink list sh.exe
\Users\ek\scoop\apps\git\2.48.1\bin\sh.exe
C:\Users\ek\scoop\apps\git\2.48.1\bin> (Get-FileHash bash.exe).Hash
2F8D7CB8CA7DF3F11985409B73C273C424272B0E6D648E58C178B3D462E942F9
C:\Users\ek\scoop\apps\git\2.48.1\bin> (Get-FileHash sh.exe).Hash
2F8D7CB8CA7DF3F11985409B73C273C424272B0E6D648E58C178B3D462E942F9

Why that makes gix-command behave subtly wrong

When git runs shell commands, it does not use those shims, so it uses its sh as such. That is, it invokes its non-shim sh.exe, causing it to run in POSIX mode. But when gix-command runs a command with a shell due to use_shell being set to true, it runs sh:

let mut cmd = Command::new(
prep.shell_program
.unwrap_or(if cfg!(windows) { "sh" } else { "/bin/sh" }.into()),
);

On systems that have a usable sh that can be found in such a PATH search, that will often be the (git root)\bin\sh.exe shim associated with Git for Windows, since many users of gitoxide on Windows--and more broadly of tools on Windows that operate on Git repositories, some of which may use gitoxide library crates--will have Git for Windows installed with that directory in their PATH.

As described above, this shim is called sh but it is really a shim for bash. It runs a bash shell called bash with argv[0] set to bash. The resulting shell instance does not enter POSIX mode, even though, from the perspective of gix-command, it ran sh.

But we may need to use the shim

This issue was not a motivation for #1862. But as originally envisioned, that PR would have fixed this. One of its changes is to replace the above code with:

let shell = prep.shell_program.unwrap_or_else(|| gix_path::env::shell().into());
let mut cmd = Command::new(shell);

Where the implementation of gix_path::env::shell() is also changed, but in the original vision of #1862 was intended to continue using the non-shim sh.exe in Git for Windows instead of the shim.

Using the non-shim sh.exe would fix this issue. But does not seem to be a reasonable thing to do without further environment customization to account for the absence of the shim's functionality. Such customization may be possible, but I think it is beyond the scope of #1862. When not using a shim, some environment variables--including PATH directories with expected tools--may be absent or set to unusable values.

The shim helps avoid running wrong tool executables

Such a shell may even pick up executables that link to msys-2.0.dll from a different MSYS2 installation from the one the shell itself uses which. Unlike most Windows programs, MSYS2 programs that use one msys-2.0.dll can have problems running other MSYS2 programs that use another msys-2.0.dll or a different version of build, even when all executables and DLLs are in safe locations and all executables load the correct DLLs. This is documented for Cygwin.

I am unsure if it is generally as much of a problem in MSYS2, which does not seem to document it as something to be concerned about. The strange error currently blocking #1862 turns out to be such a case, though it is subtler and weirder than the examples given in that FAQ entry, and it may be unknown and I think may even be considered a bug in MSYS2. I'll give full details at #1862 soon (edit: #1862 (comment)); this fragment is so that abandoning shims as a way to fix this issue is not rushed into in the future without awareness of the risks.

Expected effect of #1862

Both for the general reason about PATH and other environment variables, and in view of the specific problem encountered already, I think the way forward in #1862 will be to prefer the shim.

Thus it will not solve the problem described in this issue, and will even somewhat exacerbate it by making gix-command use the Git for Windows sh.exe shim (which is a shim for bash.exe) if present, even when another sh.exe would be found in a PATH search.

Because the actual non-shim to shim change will be in gix_path::env::shell(), this issue will also be exacerbated in the sense that it will apply to any other uses of shell() that do not take steps to mitigate it (such as those suggested below).

It seems to me that this issue is much less severe than the problems of having an insufficient or malfunctioning environment, and that it is justified to exacerbate this issue in that way. But #1862 is one of my motivations for opening this, so that it is known.

Expected behavior 🤔

When gix-command uses sh from Git for Windows, it should behave as sh does in Git for Windows when git runs it, running it in POSIX mode as sh does. See "Git behavior" and "Steps to reproduce" below for a verification of the difference and a demonstration of how they currently behave differently.

Possible solutions I don't think will work well

It would be nice to have an executable that, when run, defaults to running the shell in POSIX mode.

Setting an environment variable like POSIXLY_CORRECT should be avoided here, since it would be inherited by non-subshell child processes of the shell and potentially affect their behavior.

We can probably pass -o posix on Windows. I am not sure if there are any major problems of this, but I think there are some notable problems:

  • We must only do it when the shell is not customized with shell_program(). But then a value of sh for shell_program() causes what would already have behaved as sh not to behave like sh, which is extremely unintuitive. If it is special-cased to include values like sh passed to shell_program() then we suffer the opposite but comparably bad effect of not using sh like it works when it is run straightforwardly.
  • If this is done, it should probably only be done when we are running the sh.exe shim associated with Git for Windows, not any other sh.exe. This will complicate the implementation, and potentially result in greater coupling of implementation details between gix-path and gix-command, since whether -o posix is to be passed in gix-command would be determined by information obtained in gix-path.
  • Limiting it in that way would also not cause other sh.exe shims that are shims for that shim to still have the undesirable behavior. For example, when git is installed through scoop, a sh.exe is placed in a bin directory in the PATH that is a scoop shim for the Git for Windows sh.exe shim that is actually a shim for its bash.exe.

So I would like to avoid that approach if possible. This leaves two other clear alternatives, and maybe others I haven't thought of.

Possible solutions I think may work well

First, maybe this is just a bug in Git for Windows. If not, then it is presumably due to an unfortunate circumstance such that having the shim one would intuitively expect would not do the right thing, which would be useful to know about because Git for Windows should probably document that somewhere (such as in its wiki) and since the underlying cause might potentially apply to gitoxide in some way.

If it is a bug in Git for Windows, then fixing it there would also fix it here. I believe that, unlike some other installations of git, it is rare (and inadvisable) to continue using very old versions of Git for Windows, since as far as I know there are no further-downstream builds analogous to those in operating system distributions like Debian that (roughly speaking) fix security bugs while leaving non-security bugs alone.

If it is not a bug in Git for Windows, then we can try to run the non-shim executable and do our own environment modifications. Due to how process creation on Windows is slower than on Unix-like operating systems, running all commands through a shim should perhaps be avoided anyway. But whether done to fix this issue or for performance (or greater versatility), I think that is something that would be easy to get wrong and should be done very carefully. In particular, every version of Git for Windows has a chance to ship shims that work differently to account for changes in other parts of Git for Windows. In contrast, gitoxide has no such versioned coupling to the Git for Windows shims.

Git behavior

As noted above, git runs sh in such a way that, from the shell's perspective, it is really run as sh as behaves as such.

This can be demonstrated on Windows, in PowerShell. First, I ran these commands to create git repository that will display information that distinguishes both the status and some of the effects of POSIX mode for the shell that runs it, when a fetch operation is performed in the repository:

git init what-git-shell
cd what-git-shell
git remote add origin ssh://localhost/repo.git
git config core.sshCommand 'exec >&2; ps | grep -E "^\s*(PID|$$)\b"; echo "BASH=$BASH"; echo "SHELLOPTS=$SHELLOPTS"; export -p | head -n1; :'

The command is somewhat complicated by the ps in Git for Windows not supporting PID arguments for filtering, and also by the inability to embed newlines in the command without changing its interpretation (even though that would work in various related cases).
 
Then I ran git fetch, which printed this, where the error message is itself no problem (my custom SSH command does not attempt to actually be usable for fetching):

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
      307       1     307      36120  cons1     197609 21:04:33 /usr/bin/sh
BASH=/usr/bin/sh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
export ALLUSERSPROFILE="C:\\ProgramData"

The relevant part is:

      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
      307       1     307      36120  cons1     197609 21:04:33 /usr/bin/sh
BASH=/usr/bin/sh
SHELLOPTS=braceexpand:hashall:interactive-comments:posix
export ALLUSERSPROFILE="C:\\ProgramData"

This shows that the running shell process is observed by other MSYS2 processes as /usr/bin/sh, that it is a bash shell that sees its own argv[0] as /usr/bin/sh, that it is in POSIX mode (the trailing posix field in the value of SHELLOPTS), and that it exhibits POSIX style export -p output (that the variable is ALLUSERSPROFILE, and its value, are not important here).

Steps to reproduce 🕹

The difference is demonstrated by running gix fetch in the same repository created above to demonstrate the Git behavior. The output was:

      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
      317       1     317      16100  ?         197609 21:25:11 /usr/bin/bash
BASH=/usr/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
declare -x ALLUSERSPROFILE="C:\\ProgramData"
Error: An IO error occurred when talking to the server

Caused by:
    failed to fill whole buffer

The relevant part is:

      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND
      317       1     317      16100  ?         197609 21:25:11 /usr/bin/bash
BASH=/usr/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
declare -x ALLUSERSPROFILE="C:\\ProgramData"

This shows that the running shell process is observed by other MSYS2 processes as /usr/bin/bash, that it is a bash shell that sees its own argv[0] as /usr/bin/bash, that it is not in POSIX mode (no posix field in the value of SHELLOPTS), and that it accordingly does not exhibit POSIX-style export -p output (as above, the variable and value aren't affected by whether it's in POSIX mode, only which format it uses).

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Mar 2, 2025
It now prefers the `(git root)/bin/sh.exe` shim, falling back to
the `(git root)/usr/bin/sh.exe` non-shim to support the Git for
Windows SDK which doesn't have the shim.

The reason to prefer the shim is that it sets environment
variables, including prepending `bin` directories that provide
tools one would expect to have when using it. Without this, common
POSIX commands may be unavailable, or different and incompatible
implementations of them may be found. In particular, if they are
found in a different MSYS2 installation whose `msys-2.0.dll` is of
a different version or otherwise a different build, then calling
them directly may produce strange behavior. See:

- https://cygwin.com/faq.html#faq.using.multiple-copies
- GitoxideLabs#1862 (comment)

Overall this makes things more robust than either preferring the
non-shim or just doing a path search for `sh` as was done before
that. But it exacerbates GitoxideLabs#1868 (as described there), so if the Git
for Windows `sh.exe` shim continues to work as it currently does,
then further improvements may be called for here.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Mar 3, 2025
This makes a few changes to make `shell()` more robust:

1. Check the last two components of the path `git --exec-path`
   gave, to make sure they are `libexec/git-core`.

   (The check is done in such a way that the separator may be `/`
   or `\`, though a `\` separator here would be unexpected. We
   permit it because it may plausibly be present due to an
   overriden `GIT_EXEC_PATH` that breaks with Git's own behavior of
   using `/` but that is otherwise fully usable.)

   If the directory is not named `git-core`, or it is a top-level
   directory (no parent),  or its parent is not named `libexec`,
   then it is not reasonable to guess that this is in a directory
   where it would be safe to use `sh.exe` in the expected relative
   location. (Even if safe, such a layout does not suggest that a
   `sh.exe` found in it would be better choice than the fallback of
   just doing a `PATH` search.)

2. Check the grandparent component (that `../..` would go to) of
   the path `git --exec-path` gave, to make sure it is recognized
   name of a platform-specific `usr`-like directory that has been
   used in MSYS2.

   This is to avoid traversing up out of less common directory
   trees that have some different and shallower structure than
   found in a typical Git for Windows or MSYS2 installation.

3. Instead of using only the `(git root)/usr/bin/sh.exe` non-shim,
   prefer the `(git root)/bin/sh.exe` shim. If that is not found,
   fall back to the `(git root)/usr/bin/sh.exe` non-shim, mainly to
   support the Git for Windows SDK, which doesn't have the shim.

   The reason to prefer the shim is that it sets environment
   variables, including prepending `bin` directories that provide
   tools one would expect to have when using it. Without this,
   common POSIX commands may be unavailable, or different and
   incompatible implementations of them may be found.

   In particular, if they are found in a different MSYS2
   installation whose `msys-2.0.dll` is of a different version or
   otherwise a different build, then calling them directly may
   produce strange behavior. See:

   - https://cygwin.com/faq.html#faq.using.multiple-copies
   - GitoxideLabs#1862 (comment)

   This makes things more robust overall than either preferring the
   non-shim or just doing a path search for `sh` as was done before
   that. But it exacerbates GitoxideLabs#1868 (as described there), so if the
   Git for Windows `sh.exe` shim continues to work as it currently
   does, then further improvements may be called for here.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Mar 11, 2025
This makes a few changes to make `shell()` more robust:

1. Check the last two components of the path `git --exec-path`
   gave, to make sure they are `libexec/git-core`.

   (The check is done in such a way that the separator may be `/`
   or `\`, though a `\` separator here would be unexpected. We
   permit it because it may plausibly be present due to an
   overriden `GIT_EXEC_PATH` that breaks with Git's own behavior of
   using `/` but that is otherwise fully usable.)

   If the directory is not named `git-core`, or it is a top-level
   directory (no parent),  or its parent is not named `libexec`,
   then it is not reasonable to guess that this is in a directory
   where it would be safe to use `sh.exe` in the expected relative
   location. (Even if safe, such a layout does not suggest that a
   `sh.exe` found in it would be better choice than the fallback of
   just doing a `PATH` search.)

2. Check the grandparent component (that `../..` would go to) of
   the path `git --exec-path` gave, to make sure it is recognized
   name of a platform-specific `usr`-like directory that has been
   used in MSYS2.

   This is to avoid traversing up out of less common directory
   trees that have some different and shallower structure than
   found in a typical Git for Windows or MSYS2 installation.

3. Instead of using only the `(git root)/usr/bin/sh.exe` non-shim,
   prefer the `(git root)/bin/sh.exe` shim. If that is not found,
   fall back to the `(git root)/usr/bin/sh.exe` non-shim, mainly to
   support the Git for Windows SDK, which doesn't have the shim.

   The reason to prefer the shim is that it sets environment
   variables, including prepending `bin` directories that provide
   tools one would expect to have when using it. Without this,
   common POSIX commands may be unavailable, or different and
   incompatible implementations of them may be found.

   In particular, if they are found in a different MSYS2
   installation whose `msys-2.0.dll` is of a different version or
   otherwise a different build, then calling them directly may
   produce strange behavior. See:

   - https://cygwin.com/faq.html#faq.using.multiple-copies
   - GitoxideLabs#1862 (comment)

   This makes things more robust overall than either preferring the
   non-shim or just doing a path search for `sh` as was done before
   that. But it exacerbates GitoxideLabs#1868 (as described there), so if the
   Git for Windows `sh.exe` shim continues to work as it currently
   does, then further improvements may be called for here.
@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed labels Mar 11, 2025
@Byron
Copy link
Member

Byron commented Mar 11, 2025

Thanks so much for this incredible research!

I have lost all faith for ever getting this right by myself 😅, and can only leave decisions on how to best tackle this to you. Personally I'd prefer correctness and execute through the shim by default if it makes anything better, and deal with performance problems later.

@EliahKagan
Copy link
Member Author

Personally I'd prefer correctness and execute through the shim by default if it makes anything better, and deal with performance problems later.

Yes, correctness is more important than performance here. But the shim is the cause of the incorrect behavior described in this issue. But not using the shim, unless other steps are taken, will cause a more severe form of incorrect behavior (#1862 (comment)).

That the shim causes the issue described here where the shell is wrongly not in POSIX mode might be a bug in Git for Windows. That is, this entire issue may simply be a bug in Git for Windows, as it manifests in gitoxide's interaction with Git for Windows. If so, then it could be entirely fixed by changes to Git for Windows. I would consider that outcome to be ideal. I will look into that.

In contrast, some beneficial effects that we currently only get when we use the shim are needed to avoid problems that are more serious, and some of those problems are not due to bugs in other software.

The specific problem I encountered when not using the shim, summarized above and detailed in #1862 (comment), is hopefully a bug that can be fixed in MSYS2, though I am not confident that it is considered a bug. But even if so, configured shell commands, hook scripts, fixture scripts, and any other shell scripts generally need to have access to common Unix tools--the "standard library" of shell scripting--such as cat and rm. On Unix-like systems, this can generally be assumed, but not on Windows. Part of what the shim does is to customize the environment to make that happen.

So we should call the shim unless we can customize the environment sufficiently ourselves. Customizing the environment ourselves would likely improve performance, but that is not the main reason I am interested in eventually doing it. Rather, it would actually be more similar to what git does. In Git for Windows, git does not use any shim when it runs a custom command or hook. Instead, it customizes its environment.

My understanding of how and when git does that is incomplete. I believe this was originally implemented in Git for Windows through the git shim, which should not be confused with other shims Git for Windows provides such as the sh shim (that is really a shim for bash, per this issue) and the bash shim. But in git-for-windows/git#2506 the non-shim git executable was enhanced to customize its own environment when run in a way that it detects is not through its shim. This is done by setup_windows_environment, which conditionally customizes PATH by calling the append_system_bin_dirs function.

@Byron
Copy link
Member

Byron commented Mar 12, 2025

Thanks for the clarification, I do usually have trouble to correctly digest everything in long write-ups, and using a the 'summary' feature feels dangerous, too.

In Git for Windows, git does not use any shim when it runs a custom command or hook. Instead, it customizes its environment.

That's perfect - if I understand correctly one could safely adopt this code, avoid the shim, and become independent of any bug-fix in Git for Windows, all while avoiding potential performance issues.
I am probably missing something though, as it didn't sound quite so obvious when you mentioned it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants