Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #87 #152

Closed
wants to merge 3 commits into from
Closed

Conversation

david-alexander
Copy link

This fixes #87 by adding a check that happens before running the commands for events such as onSuccess. The check makes sure that there hasn't been another such event during the time delay between when the original event happened and when we were actually ready to start the new command. This situation can happen if the old command takes too long to respond our attempts to kill it.

I have also added a new test that fails without the change and passes with it. I wasn't sure which test suite to put it in, so let me know if you'd like me to move it to another test suite or create a new one for it. Also, the test uses a small script to simulate a process that doesn't respond nicely to SIGTERM, but I had to put that script outside the test directory (in test-commands/unkillable-command.js) to prevent it being picked up by the test/**/*.js glob and treated as a test suite.

lib/tsc-watch.js Outdated
@@ -193,7 +213,7 @@ const Signal = {

nodeCleanup((_exitCode, signal) => {
tscProcess.kill(signal);
killProcesses(true).then(() => process.exit());
killProcesses(true).catch(silentlyHandleCancellation).finally(() => process.exit());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from then to finally isn't strictly necessary to fix this specific issue, but it could fix other potential issues if something went wrong in killProcesses().

Copy link
Owner

@gilamran gilamran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
In your solution, you cancel the previous promises that try to kill the process. This means that the first killProcesses will be throw and wont run the then. That's great. but the second time (which does the cancel) create new promises that try to kill the first run command... not sure this is the right way to go.

Maybe push all the promises to an array, and wait for all (same) promises, don't create new ones?

@david-alexander
Copy link
Author

Yes, I see there is a subtle edge case that I didn't think of, where we try to kill the same set of processes twice. This could cause one of the following issues:

  • We get an error from kill/taskkill/psTree because the PID no longer exists.
  • The PID has been reused, and we end up killing the wrong process.
  • We end up starting a new set of processes (e.g. run number 3) before an earlier set (e.g. run number 1) has finished being killed.

Have I understood correctly what you're saying?

If so, I agree with your suggestion that killProcesses should wait for any previous invocation to finish before resolving. I still think it should also cancel the previous run command and start a new one (instead of just letting the previous one happen), as this will make it easier in future if the run command needs to be dependent on some state captured at the time it was triggered.

Let me know if you agree with the above, and if so, I'll go ahead and make the change.

@gilamran
Copy link
Owner

gilamran commented Sep 11, 2022

You can't start a new process till the existing one exited, might get for example "address already in use" when restarting a server. so we must wait for the kill to finish, but we might get another request to kill before this one ends. so, instead of creating new promises to kill, wait for the existing promises to finish...

so in tsc-watch on this function:

function killProcesses(killAll) {
  return Promise.all([
    killAll && firstSuccessKiller ? firstSuccessKiller() : null,
    successKiller ? successKiller() : null,
    failureKiller ? failureKiller() : null,
    compilationStartedKiller ? compilationStartedKiller() : null,
    compilationCompleteKiller ? compilationCompleteKiller() : null,
  ]);
}

Each killer function is creating a promise, and all of them are returned in the Promise.all, we should save this promise and if it exists use it instead of creating a new one... the problem is that when this promises resolves, it will continue and do the creation of the processes... which we shouldn't do

so... in order to fix that, we should send an id to the killProcesses that it will return once it's resolved. and if this id matches the current compilation id, we do the then else we don't...

I know that it's a bit confusing... let me know if you need extra details.
Thanks for the PR!

@david-alexander
Copy link
Author

Yes, that makes sense and is basically the same as what I was thinking. My idea is slightly different from passing an ID to killProcesses, but will have the same effect. I'll make the changes to the PR.

@gilamran
Copy link
Owner

gilamran commented Sep 11, 2022

I've taken a bite on this, let me know what you think: 53f8dad
This was not published to the v6 branch (yet)

@david-alexander
Copy link
Author

That looks like it would work, but I have an idea for a different way that might be more readable and therefore less likely to hide bugs / edge cases. I can have a go at it later today if you don't mind waiting.

@gilamran
Copy link
Owner

Please do, I would love to make it more clear.

@taxilian
Copy link

taxilian commented Oct 3, 2022

Heh. I finally went to test this (I've been getting pulled in way too many directions lately) but now I'm feeling like I need to wait for an update ;-)

@david-alexander
Copy link
Author

Sorry @taxilian! Same here... have been meaning to get this PR updated. The code changes are basically ready so I'll try to get them pushed today.

@david-alexander
Copy link
Author

I've just pushed my latest changes, but I still need to add some documentation regarding how they work and the rationale for doing it this way. I'll try to get that done later today.

The tests are passing for me (including the new one I added), although I had to increase the timeouts on some of them. I haven't pushed the change to the timeout as I wasn't sure if it was just something to do with my setup that was causing the issue.

this.currentState = { state: 'running', commands, isTriggeredByClient, runID };
for (const command of commands)
{
command.run({ emitSignal: true });
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should emitSignal always be true, or only when isTriggeredByClient is false?

@@ -29,46 +192,6 @@ const {
args,
} = extractArgs(process.argv);

function killProcesses(killAll) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure of the purpose of killAll and treating firstSuccessKiller differently from the others, so I haven't replicated that in my version. If this behavior needs to remain, let me know and I can put it in.

Copy link
Author

@david-alexander david-alexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to address the concerns in the discussion. I ended up opting for a state machine approach rather than just using promises, as it turned out that promises weren't going to do a great job of handling all the edge cases (even with async/await).

Hopefully this makes sense. Let me know if you have any questions. (Note I've left a couple of questions for you in the PR comments.)

EDIT: It occurs to me that I've used some modern JS features (classes and async/await) that aren't used elsewhere in the codebase, and I'm not sure if this was because you were targeting an ES version that doesn't support these. If so, let me know and I can use alternatives instead.

@gilamran
Copy link
Owner

gilamran commented Oct 4, 2022

@david-alexander I looked at your implementation and it was very complicated in my opinion, it was hard to get my head around it. I've implemented a simple solution that uses compilationId. it was implemented on the up coming version 6.
You can see the changes here: 53f8dad

Let me know what you think.
Thanks

p.s.
You can give it a try when installing the dev version by doing: npm install tsc-watch@dev

@david-alexander
Copy link
Author

@gilamran I understand what you mean about it being hard to get your head around, especially if you aren't familiar with the idea of a "state machine".

Personally I do think a state machine is a good fit for this problem (this is actually how async/await works internally, but it's just not quite flexible enough to do what we want here). Once you understand the concept, I think it makes it easier to reason about how it will handle edge cases.

Having said that, I can't immediately see a case where your solution would fail; however, I'd like to give this some more thought, as I did initially try something similar to your solution and realized that it might fail in some cases (though I can't remember exactly which).

I will try to find time to think more about this later today. In the meantime, for anyone who is having this issue, I think they should be able to use either of our solutions.

@david-alexander
Copy link
Author

Ok, I have taken a close look at your solution and I think it covers all the edge cases.

However, I still think we can improve on the readability, which I think is particularly important for this kind of fiddly asynchronous stuff. (For example, it took me longer than it should have to convince myself that there wasn't a race condition between the two assignments to runningKillProcessesPromise that could cause it to be set to null when it shouldn't.)

I thought a bit more about my proposed solution, and realized that actually an explicit state machine isn't necessary after all. It could actually be done with something like the following (pseudocode), which (in my opinion) is more readable than both your solution and my previous one:

  let runningProcessesKiller: ()=>void | null = null;
  let desiredProcesses: Command[] = [];

  function onReady(): void
  {
    runningProcessesKiller = startProcesses(desiredProcesses);
  }

  function onCompilationEvent(processesToRun: Command[]): void
  {
    desiredProcesses = processesToRun;

    if (runningProcessesKiller)
    {
      let killer = runningProcessesKiller;
      runningProcessesKiller = null;

      runningProcessesKiller().then(onReady);
    }
  }

The above would replace killProcesses() and all of the runOn...() functions. The following additional code changes would be required:

  • onReady would be called on tsc-watch startup to kick things off.
  • startProcesses would start processes for all of the commands passed to it, and would return a single combined killer function for all of them. (And would correctly handle an empty set of commands, by starting no processes and returning a no-op killer function.)
  • onCompilationEvent would be called whenever a compilation event occurred, and would be passed the set of commands to run.

This solution doesn't require the compilation ID, because we never create those concurrent killProcesses(...).then(...) chains in the first place. Instead, we store the "pending" set of commands in desiredProcesses, and then, whenever the killing is finished, onReady will just look at the latest value of desiredProcesses, ignoring any earlier values that may have been set in the interim.

Here's my reasoning on why this should work in all cases:

  • onReady only gets called when there are no commands running (on startup, and after killing completes).
  • Any time there are commands running (and not yet being killed), runningProcessKiller will be non-null.
  • The same runningProcessKiller is never called twice, as it is immediately set to null whenever it is called.
  • runningProcessKiller is never overwritten with a new killer when it is already non-null, because onReady can never be called in this scenario.

Let me know what you think of this. If you're interested in looking at this further I'm happy to turn the above pseudocode into real code and update the PR so you can see more clearly how it would fit in.

@gilamran
Copy link
Owner

gilamran commented Dec 8, 2022

A fix on this was released on V6

@gilamran gilamran closed this Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Child processes that don't die after SIGTERM are orphaned
4 participants