Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APP-6886] Add option to reboot device after a certain amount of time #51

Merged
merged 6 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion subsystems/provisioning/definitions.go
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,12 @@ func ConfigFromJSON(defaultConf Config, jsonBytes []byte) (*Config, error) {
return &conf, errw.Errorf("timeout values cannot be less than %s", time.Duration(minTimeout))
}

if conf.DeviceRebootAfterOfflineMinutes != 0 &&
conf.DeviceRebootAfterOfflineMinutes < conf.OfflineTimeout ||
conf.DeviceRebootAfterOfflineMinutes < conf.UserTimeout {
return &conf, errw.Errorf("device_reboot_after_offline_minutes cannot be less than offline_timeout or user_timeout")
}

return &conf, nil
}

Expand Down Expand Up @@ -370,6 +376,10 @@ type Config struct {

// If set, will explicitly enable or disable power save for all wifi connections managed by NetworkManager.
WifiPowerSave *bool `json:"wifi_power_save"`

// If set, will reboot the device after it has been offline for this duration
// 0, default, will disable this feature.
DeviceRebootAfterOfflineMinutes Timeout `json:"device_reboot_after_offline_minutes"`
}

// Timeout allows parsing golang-style durations (1h20m30s) OR seconds-as-float from/to json.
Expand All @@ -386,7 +396,7 @@ func (t *Timeout) UnmarshalJSON(b []byte) error {
}
switch value := v.(type) {
case float64:
*t = Timeout(value * float64(time.Second))
*t = Timeout(value * float64(time.Minute))
return nil
case string:
tmp, err := time.ParseDuration(value)
Expand Down
16 changes: 16 additions & 0 deletions subsystems/provisioning/networkmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ import (
"context"
"errors"
"os"
"os/exec"
"reflect"
"sort"
"syscall"
"time"

gnm "github.com/Otterverse/gonetworkmanager/v2"
Expand Down Expand Up @@ -737,6 +739,20 @@ func (w *Provisioning) mainLoop(ctx context.Context) {
}
}

offlineRebootTimeout := w.cfg.DeviceRebootAfterOfflineMinutes > 0 &&
lastConnectivity.Before(now.Add(time.Duration(w.cfg.DeviceRebootAfterOfflineMinutes)*-1))
if offlineRebootTimeout {
w.logger.Infof("device has been offline for more than %s minutes, rebooting", w.cfg.DeviceRebootAfterOfflineMinutes)

syscall.Sync() // flush file system buffers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed when letting a normal shutdown happen. Filesystems will get synced/unmounted in their own ordering.

cmd := exec.Command("systemctl", "reboot")
output, err := cmd.CombinedOutput()
if err != nil {
w.logger.Error(errw.Wrapf(err, "running 'systemctl reboot' %s", output))
}
time.Sleep(time.Second * 100) // systemd DefaultTimeoutStopSec defaults to 90 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't do this, or agent itself won't shut down when called and will just hang until the sleep is done. It will also fail healthchecks if the loop pauses here for too long. There's a function for pausing the loop though. See line 623 above for the previous use of the following:

if !w.mainLoopHealth.Sleep(ctx, time.Second*100) {
	return
}
w.logger.Errorf("Failed to reboot after %s time!", time.Second * 100)			

May also need a slightly longer timeout, as I BELIEVE that systemd 90 second default is per service, so if it's shutting down multiple services, it can definitely take longer than that. I'd think 2 or maybe even 5 minutes, since this isn't going to handle the failure case (where reboot doesn't happen.) Regardless, I'd DEFINITELY suggest logging an error if the timeout expires (included in suggestion above... rewrite as needed) so the user knows things are off the rails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks i missed there's a sleep on the main loop

}

hitOfflineTimeout := lastConnectivity.Before(now.Add(time.Duration(w.cfg.OfflineTimeout)*-1)) &&
pModeChange.Before(now.Add(time.Duration(w.cfg.OfflineTimeout)*-1))
// not in provisioning mode, so start it if not configured (/etc/viam.json)
Expand Down
Loading