Implement support for an emergency killswitch. #164

patnebe · 2025-02-16T16:24:05Z

Context

Implement an emergency killswitch mechanism to stop the agent if a file is on disk and prevent it from starting
This can be used as a quick rollback/backout mechanism if any issues are encountered post-deployment

Changes

Add a new CLI argument specifying the path to the killswitch file
Split main into code paths -- continuous_profling_mode (assumed if no duration is specified) vs adhoc_profiling_mode
Move the non-continuous profiling code path into a new method called collect_profiles_adhoc
Add a new Runner object which will be used to run the profiler when in continuous profiling mode. Within the runner
- Prevent the profiler from starting if the kill switch file is detected
- Start the profiler in a separate thread
- Check for the kill switch file every 30 seconds, and stop/start the profiler depending on the profiler_run_state.

Test Plan

CI
Manual test

…switch

patnebe · 2025-02-16T16:30:11Z

src/cli/main.rs

@@ -170,87 +262,48 @@ fn main() -> Result<(), Box<dyn Error>> {
        ..Default::default()
    };

-    let (stop_signal_sender, stop_signal_receive) = bounded(1);
+    let (profiler_stop_signal_sender, profiler_stop_signal_receiver) = bounded(1);
+    let profiler: ThreadSafeProfiler = Arc::new(Mutex::new(Profiler::new(


Some thoughts on this.

Wrapping the Profiler in an Arc<Mutex<>> + Profiler::run() being a blocking call means that the mutex won't be unlocked until Profiler::run() returns or the thread running the profiler exits. One side effect is that only one instance of the profiler can be run at any given time. This sounds like desirable behaviour, but maybe there are reasons why we wouldn't want this.

patnebe · 2025-02-16T17:12:20Z

src/cli/runner.rs

+        profiler: ThreadSafeProfiler,
+        killswitch_file_path: String,
+        runner_stop_signal_receiver: Receiver<()>,
+        profiler_stop_signal_sender: Sender<()>,


I see one major downside of creating the profiler + the profiler_stop_signal* channel endpoints outside the Runner struct and passing them in.

In theory, the profiler can be stopped if a signal is sent to it from outside this class resulting in the runner thinking that the profiler is running when it's not. Will think more about how best to handle this.

patnebe · 2025-02-16T17:15:58Z

src/cli/runner.rs

+        thread::spawn(move || {
+            p.lock().unwrap().run(); // This is a blocking call.
+        });


Maybe keep track of the join handle so we can cleanup?

javierhonduco

Thanks for implementing this feature. I'll review this PR more in detail once it's ready. Something that we should consider is that perhaps the killswitch file should be in a hardcoded path rather than being configurable. This could help as a well-known location that doesn't change could speed up enabling the killswitch in a fool-proof way.

To handle the possibility of running in on-off mode with a killswitch (which is something that we might or might not want to allow), perhaps a flag could be added (--unsafe / --i-know-what-i-am-doing) that allows running the profiler even in the presence of a killswitch. What do you think?

patnebe added 6 commits February 14, 2025 08:08

wip

b3692c0

.

d5be23c

.

0fd9a5c

update nix commands

f893f2a

Merge branch 'main' of github.com:javierhonduco/lightswitch into kill…

959b675

…switch

.

a42eb3a

patnebe commented Feb 16, 2025

View reviewed changes

patnebe added 2 commits February 16, 2025 16:44

.

d8c33aa

.

21b9512

patnebe commented Feb 16, 2025

View reviewed changes

javierhonduco reviewed Feb 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for an emergency killswitch. #164

Implement support for an emergency killswitch. #164

patnebe commented Feb 16, 2025

patnebe Feb 16, 2025 •

edited

Loading

patnebe Feb 16, 2025

patnebe Feb 16, 2025

javierhonduco left a comment

Implement support for an emergency killswitch. #164

Are you sure you want to change the base?

Implement support for an emergency killswitch. #164

Conversation

patnebe commented Feb 16, 2025

Context

Changes

Test Plan

patnebe Feb 16, 2025 • edited Loading

Choose a reason for hiding this comment

patnebe Feb 16, 2025

Choose a reason for hiding this comment

patnebe Feb 16, 2025

Choose a reason for hiding this comment

javierhonduco left a comment

Choose a reason for hiding this comment

patnebe Feb 16, 2025 •

edited

Loading