Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Child processes that don't die after SIGTERM are orphaned #87

Closed
justinmchase opened this issue May 9, 2020 · 28 comments
Closed

Child processes that don't die after SIGTERM are orphaned #87

justinmchase opened this issue May 9, 2020 · 28 comments

Comments

@justinmchase
Copy link

I'm using tsc-watch with a command like this:

tsc-watch --pretty --preserveWatchOutput --onSuccess "node build"

In my application code I am handling SIGTERM like this:

process.on('SIGTERM', ...)

In that code I've discovered a bug which can result in the process never closing, never exiting. When this happens tsc-watch seems to just lose track of the process and not care about what happens. Later, if tsc-watch itself is killed that orphaned process continues running in the background forever.

It would be nice if tsc-watch would wait for the process to actually exit for a period of time, say 3 seconds? And if at that point it is still open then for it to send a second signal, say SIGQUIT?

@justinmchase
Copy link
Author

In your killer module:
https://github.com/gilamran/tsc-watch/blob/master/lib/killer.js#L18

Why aren't you just using the node api's process.kill() function?
https://nodejs.org/dist/latest-v12.x/docs/api/child_process.html#child_process_subprocess_kill_signal

Essentially in that function I'd like to handle the process.on('close', ...) and if its not received in 3s then send a different kill signal. I could make a PR, what do you think?

@justinmchase
Copy link
Author

justinmchase commented May 9, 2020

I just went in and modified the code in my node_modules to play around with it and calling process.kill(KILL_SIGNAL) results in the same behavior.

After playing around with it I think that the module should just look like this:

const KILL_SIGNAL = 'SIGTERM';

module.exports = function kill(child) {
  return new Promise((resolve, reject) => {
    child.on('close', resolve)
    child.on('error', reject)
    child.kill(KILL_SIGNAL)
  });
}

Unless there is a specific reason it was implemented with the custom kill behavior? I'd rather have lib_uv figure it out I guess.

But at the very least I think it should be listening to child.on('close', resolve) to call the resolver instead of the kill functions themselves. This will cause it to wait for the process to actually close, however long it takes before continuing on and making a new process.

EDIT: Nevermind 🤔 I see that its actually waiting for both the killer and the process in an outer kill function here https://github.com/gilamran/tsc-watch/blob/master/lib/runner.js#L21

So in that case I think that tsc-watch is operating correctly but its just a really confusing experience when the process handles SIGTERM and then doesn't exit for a while.

@gilamran
Copy link
Owner

What would you suggest I should change?
Please also note that we had an issue with killing processes in the past, take a look at #71

@justinmchase
Copy link
Author

Well after looking at this pretty closely I think that tsc-watch is actually fine, not the one causing the "problem" for me. However, it did appear to be tsc-watch, so to reduce confusion I think it wouldn't hurt do maybe do these two things:

  1. Have the killer write something to stdout indicating what is going on, e.g.
[2:05:19 PM] File change detected. Starting incremental compilation...
[2:05:19 PM] Found 0 errors. Watching for file changes.
[2:05:19 PM] Closing application...
[2:05:22 PM] Waiting for application to close...
[2:05:25 PM] Waiting for application to close...
  1. After each 3s interval send a second, different signal such as 'SIGABORT'

These are both pretty optional, its not that tsc-watch has a bug its just that I couldn't tell where the issue was and it seemed like tsc-watch had the issue since it was controlling my process.

Thanks for listening.

@gilamran
Copy link
Owner

Sounds good, How can I reproduce the "hanging" situation?

@justinmchase
Copy link
Author

in the app that is running if you implement a handler for 'SIGTERM' and never call process.exit() it will basically hang.

// index.js
setInterval(() => console.log('still alive...'), 3000)
process.on('SIGTERM', () => console.log('SIGTERM received')

Then tsc-watch will send the signal and wait for the process to exit which never happens.

@Asarew
Copy link

Asarew commented May 18, 2020

This might be related, when using npm run [...] as your --onSuccess handler you might not get the behaviour you are looking for: https://lisk.io/blog/development/why-we-stopped-using-npm-start-child-processes.
This is not an issue when you have the ps command available here because tsc-watch will just kill everything. i beleive that this behaviour should be consistent even when you don't have ps available. (like in a light weight docker containers)

@hongy20
Copy link

hongy20 commented Aug 21, 2020

I experienced similar problem, which can be reproduced with the following code:

// index.ts
const id = setInterval(() => console.log('still alive...', process.pid), 3000);
process.on('SIGTERM', async () => {
    console.log('SIGTERM received', process.pid);
    await new Promise(resolve => {
        setTimeout(() => {
            clearInterval(id);
            resolve();
        }, 500);
    });
});

Steps

  1. Run command tsc-watch src/index.ts --outDir ./dist --noClear --onSuccess 'node ./dist/index.js';
  2. Save index.ts multiple times in a short period of time;
  3. Check the output in the terminal to see which processes are terminated and which ones are still alive;

Result, when save changes slowly

Screenshot 2020-08-21 at 15 12 46

Result, when save changes fast
Screenshot 2020-08-21 at 15 13 36

@gilamran
Copy link
Owner

How would you suggest fixing this?

@hongy20
Copy link

hongy20 commented Aug 21, 2020

Sorry, I'm not familiar with the code base here, but I'll suggest something like:

  • Do not start new process (--onSuccess COMMAND), until the existing one exits;
  • Only start one process (--onSuccess COMMAND), when there are multiple file changes detected within a short period of time;

I hope that make sense to you

@justinmchase
Copy link
Author

There has been some developments on this issue by the way, in some other repos where I also filed this bug (not knowing exactly where its originating from) it appears that it may be related to this source bug:

sindresorhus/execa#433

Or possibly this library is experiencing a similar issue? Would be great if anyone here can add anything to that ticket.

@justinmchase
Copy link
Author

May be related to nodejs/node#34830

@hongy20
Copy link

hongy20 commented Aug 21, 2020

For me, the problem is related with handling 'SIGTERM' event with async function. If I change the code to be sth like:

// index.ts
const id = setInterval(() => console.log('still alive...', process.pid), 3000);
process.on('SIGTERM', () => {
    console.log('SIGTERM received', process.pid);
    clearInterval(id);
});

Then the result is actually what I expect.
Screenshot 2020-08-21 at 22 34 07

@justinmchase
Copy link
Author

That's an interesting data point but even if you never called clearInterval, when the parent process closes this is supposed to be closed regardless of whether you handle SIGTERM or not, it should force it closed.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Mar 22, 2022
david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Mar 22, 2022
@david-alexander
Copy link

david-alexander commented Mar 22, 2022

I came across an issue today that seems to be related to this one, although slightly different.

In my case, the process does die in response to SIGTERM, but sometimes it can take a while. I haven't yet looked into the reason for the delay; maybe the process is doing some cleanup before it exits.

The problem is that if two "compilation complete" events happen in quick succession, the killProcesses() call from the first one may not have finished by the time the second one arrives. This means that killProcesses() will happen twice on the same process (with the second time being effectively a no-op) and then runOnCompilationComplete() will happen twice as well. So two new processes get started, and the first of them never gets killed.

I decided to have a go at fixing it, and have got a solution that seems to be working well, although I haven't had a chance to test it thoroughly yet. Before calling runOnCompilationComplete(), I check whether the original event that triggered this has been superseded by a later event. I've only implemented this for the "compilation complete" event so far, but I could easily do the other events as well.

@gilamran, if you're open to it, I'd be happy to tidy up my fix, do some more testing on it, and put together a pull request.

Incidentally, I also think we could improve the maintainability of this code by switching to async/await instead of Promise.then(), and would be happy to help with that too. (If you're worried about compatibility with older Node versions, we could add a build step that transpiles it down to ES5.)

Of course, my solution doesn't deal with the case where the process completely ignores SIGTERM, although it should at least stop it from "losing track" of the errant process. I guess a proper fix for that case would involve some sort of grace period (configurable?) and then a SIGKILL, as originally suggested by @justinmchase.

@gilamran
Copy link
Owner

@david-alexander regarding the issue you found, it sounds like something that needs to be fixed. Please do a PR and we'll talk about it there. (Thanks!)
Regarding async/await we've started a move to typescript on version 5 (not released yet) you can do your PR there.

@taxilian
Copy link

taxilian commented Sep 8, 2022

I'm having the same issue as @david-alexander where I end up with events happening too quickly and end up with multiple instances of the app running -- it's very frustrating :-/

Not meaning to complain, though, as this is still a great project and the best way (IMO) to use typescript in development; ts-node is too slow.

@david-alexander
Copy link

Sorry I still haven't got round to doing a PR with my fix for this.

@taxilian, have you tried the commit linked in my comment above to see if it fixes your issue?

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 8, 2022
david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 8, 2022
@david-alexander
Copy link

david-alexander commented Sep 8, 2022

I have just taken another look at my fix today and have made a few changes to it - the new version is here.

Please note that I haven't yet had a chance to do any testing on this new commit, apart from running the existing automated tests. 8 of the tests are failing for me, but they were failing already on the latest commit from master, so I don't think it's anything to do with these changes. But I haven't added any new tests yet to verify that this issue is fixed, so YMMV. Testing is now done and it seems to have fixed the issue. Updated commit is in the PR.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 9, 2022
@divmgl
Copy link

divmgl commented Nov 22, 2022

I'm actually running into this with pnpm. Looks like two processes are getting started when onSuccess occurs. Did we get these changes in yet?

@gilamran
Copy link
Owner

Can you please test with the next version?
npm install tsc-watch@dev

@divmgl
Copy link

divmgl commented Nov 22, 2022

@gilamran hey Gil, this was my bad. This stopped happening once I instrumented the app to properly respond to a SIGTERM signal. You can disregard my message, sorry about that.

@WadePeterson
Copy link

@gilamran, I believe I have been running into this same issue using [email protected] on two different Node services.

When running my services with tsc-watch, and then making a lot of code changes in rapid succession (in my case, from dependencies being recompiled by another process), the services will eventually start throwing EADDRINUSE errors because a previous instance didn't get shut down correctly when restarted via tsc-watch.

My problem appears to be resolved when using the latest [email protected]!

However, I did run into a new problem with the v6 version. Running on a Mac, and working in a repo using yarn, I am getting this error when running tsc-watch:

env: node\r: No such file or directory

I got around it using the workaround here, but it seems the issue is likely a Windows CRLF in the executable node script for tsc-watch.

@gilamran
Copy link
Owner

gilamran commented Dec 6, 2022

@WadePeterson Thanks for trying out version 6!
The error that you mentioned is something that happened before, I will look into it.
Thanks!

@gilamran
Copy link
Owner

gilamran commented Dec 8, 2022

@WadePeterson published new dev version. you can install it by npm i tsc-watch@dev and you should be ok now

@gilamran
Copy link
Owner

gilamran commented Dec 8, 2022

Also, if everything is ok I'll publish v6 soon.

@WadePeterson
Copy link

Just installed [email protected] and the CRLF issue is gone! Everything seems to work great for me now, and I still am not able to reproduce the orphaned process issue I was seeing in v5

@gilamran
Copy link
Owner

gilamran commented Dec 8, 2022

Version 6 published!

@gilamran gilamran closed this as completed Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants