Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for an emergency killswitch. #164

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

patnebe
Copy link
Collaborator

@patnebe patnebe commented Feb 16, 2025

Context

  • Implement an emergency killswitch mechanism to stop the agent if a file is on disk and prevent it from starting
  • This can be used as a quick rollback/backout mechanism if any issues are encountered post-deployment

Changes

  • Add a new CLI argument specifying the path to the killswitch file
  • Split main into code paths -- continuous_profling_mode (assumed if no duration is specified) vs adhoc_profiling_mode
  • Move the non-continuous profiling code path into a new method called collect_profiles_adhoc
  • Add a new Runner object which will be used to run the profiler when in continuous profiling mode. Within the runner
    • Prevent the profiler from starting if the kill switch file is detected
    • Start the profiler in a separate thread
    • Check for the kill switch file every 30 seconds, and stop/start the profiler depending on the profiler_run_state.

Test Plan

  • CI
  • Manual test

@@ -170,87 +262,48 @@ fn main() -> Result<(), Box<dyn Error>> {
..Default::default()
};

let (stop_signal_sender, stop_signal_receive) = bounded(1);
let (profiler_stop_signal_sender, profiler_stop_signal_receiver) = bounded(1);
let profiler: ThreadSafeProfiler = Arc::new(Mutex::new(Profiler::new(
Copy link
Collaborator Author

@patnebe patnebe Feb 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on this.

Wrapping the Profiler in an Arc<Mutex<>> + Profiler::run() being a blocking call means that the mutex won't be unlocked until Profiler::run() returns or the thread running the profiler exits. One side effect is that only one instance of the profiler can be run at any given time. This sounds like desirable behaviour, but maybe there are reasons why we wouldn't want this.

profiler: ThreadSafeProfiler,
killswitch_file_path: String,
runner_stop_signal_receiver: Receiver<()>,
profiler_stop_signal_sender: Sender<()>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one major downside of creating the profiler + the profiler_stop_signal* channel endpoints outside the Runner struct and passing them in.

In theory, the profiler can be stopped if a signal is sent to it from outside this class resulting in the runner thinking that the profiler is running when it's not. Will think more about how best to handle this.

Comment on lines +55 to +57
thread::spawn(move || {
p.lock().unwrap().run(); // This is a blocking call.
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep track of the join handle so we can cleanup?

Copy link
Owner

@javierhonduco javierhonduco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this feature. I'll review this PR more in detail once it's ready. Something that we should consider is that perhaps the killswitch file should be in a hardcoded path rather than being configurable. This could help as a well-known location that doesn't change could speed up enabling the killswitch in a fool-proof way.

To handle the possibility of running in on-off mode with a killswitch (which is something that we might or might not want to allow), perhaps a flag could be added (--unsafe / --i-know-what-i-am-doing) that allows running the profiler even in the presence of a killswitch. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants