Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallback: Add build flag to always try to chain-load the new boot entry #445

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jprvita
Copy link
Contributor

@jprvita jprvita commented Feb 1, 2022

The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on #128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

  1. It makes an otherwise glitch-free boot process not smooth anymore.
  2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
  3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
  4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit adds a build-time flag that forces fallback to always try to
chain-load the newly created boot entry, in the same way it did before
TPM support was added.

Fixes: #418

Signed-off-by: João Paulo Rechi Vita [email protected]

The firmware on some Acer machines (and maybe others) always resets the
boot entries and BootOrder variable to what was defined in the firmware
setup program, overriding any external changes (including the changes
made by fallback).

Before shim cared about TPMs this was not a problem in practice, as
fallback would create and chain-load a boot entry for the OS on every
boot. However, since commit 431b8a2 the system is restarted if a TPM
is detected on the system, triggering an infinite reboot loop in systems
with such firmware. This is a known problem which has been previously
reported on rhboot#128

More recently, the problem has been addressed by commit a5db51a,
which presents a screen with a countdown to the user, where they can
interrupt boot and choose to have fallback always chain-load the new
entry instead of restarting the system, to break out of the reboot loop.
While this solution works, it has a few shortcomings:

 1. It makes an otherwise glitch-free boot process not smooth anymore.
 2. The message presented is not accessible / potentially scary for
    non-technical users: if they press a key to interrupt the boot
    process, the meaning of each option is not really clear for users
    not familiar with how shim and fallback work.
 3. The whole experience is made a bit worse by the fact that after
    selecting "Continue boot" / "Always continue boot", the screen will
    remain frozen until something else draw on the framebuffer. If GRUB
    is configured to be quiet, for a glitch-free boot, this may last
    several seconds until the kernel has started and loaded the
    manufacturer logo from BGRT, which gives the impression that the
    whole boot process froze.
 4. This Boot Option Restoration screen overwrites all the debug
    information printed before it is displayed, essentially neutering
    FALLBACK_VERBOSE or SHIM_VERBOSE and making it impossible to enable
    debug without rebuilding fallback.

This commit adds a build-time flag that forces fallback to always try to
chain-load the newly created boot entry, in the same way it did before
TPM support was added.

Fixes: rhboot#418

Signed-off-by: João Paulo Rechi Vita <[email protected]>
Copy link
Member

@frozencemetery frozencemetery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I do not think a build flag for this is a good idea. This doesn't need to be configurable; it needs to be working. Adding a configuration handle is asking folks building shims which of two partly-broken behaviors they'd like to have.

Your criticisms of the countdown approach seem extraneous: none of them are addressed by this proposed change that I can see.

Based on what you've stated so far, it seems best to detect the reboot loop and handle it - and inform the machine vendor(s).

@jprvita
Copy link
Contributor Author

jprvita commented Feb 9, 2022

So far, I do not think a build flag for this is a good idea. This doesn't need to be configurable; it needs to be working. Adding a configuration handle is asking folks building shims which of two partly-broken behaviors they'd like to have.

While deciding at build time is not ideal, it is a better alternative than having distributors patching these things downstream, which is the only other way to avoid this experience. In the case of shim, downstream changes not only add maintenance burden downstream, but also to the shim-review process upstream, necessary to get shim binaries signed.

Also, yes, this gives distributors a choice between one of two things: proper TPM support or a good user experience. Right now, unless patching fallback downstream, we don't have that choice, even if we don't care about TPM at all or don't provide means for other parts of the stack to use it. Hopefully we will be able to have our cake and eat it too in the medium term (see below), but until that happens, I think it make sense to give distributors that choice and hopefully reduce the amount of downstream patching.

Your criticisms of the countdown approach seem extraneous: none of them are addressed by this proposed change that I can see.

I'm not sure what you mean here, as this patch provides a way to avoid the countdown without user interaction.

Based on what you've stated so far, it seems best to detect the reboot loop and handle it - and inform the machine vendor(s).

See Peter's comment on why trying to detect the reboot loop will not always work, and the roadmap to have this working without the possibility of getting into this specific boot loop.

Finally, while ideally this would be solved by the manufacturer in newer versions of the firmware, which is an uphill battle in itself, there will always be older systems out there that manufacturers simply won't release newer firmware versions for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fallback presents a scary message to users
2 participants