Child processes that don't die after SIGTERM are orphaned #87

justinmchase · 2020-05-09T19:35:25Z

I'm using tsc-watch with a command like this:

tsc-watch --pretty --preserveWatchOutput --onSuccess "node build"

In my application code I am handling SIGTERM like this:

process.on('SIGTERM', ...)

In that code I've discovered a bug which can result in the process never closing, never exiting. When this happens tsc-watch seems to just lose track of the process and not care about what happens. Later, if tsc-watch itself is killed that orphaned process continues running in the background forever.

It would be nice if tsc-watch would wait for the process to actually exit for a period of time, say 3 seconds? And if at that point it is still open then for it to send a second signal, say SIGQUIT?

The text was updated successfully, but these errors were encountered:

justinmchase · 2020-05-09T19:41:27Z

In your killer module:
https://github.com/gilamran/tsc-watch/blob/master/lib/killer.js#L18

Why aren't you just using the node api's process.kill() function?
https://nodejs.org/dist/latest-v12.x/docs/api/child_process.html#child_process_subprocess_kill_signal

Essentially in that function I'd like to handle the process.on('close', ...) and if its not received in 3s then send a different kill signal. I could make a PR, what do you think?

justinmchase · 2020-05-09T22:24:56Z

I just went in and modified the code in my node_modules to play around with it and calling process.kill(KILL_SIGNAL) results in the same behavior.

After playing around with it I think that the module should just look like this:

const KILL_SIGNAL = 'SIGTERM';

module.exports = function kill(child) {
  return new Promise((resolve, reject) => {
    child.on('close', resolve)
    child.on('error', reject)
    child.kill(KILL_SIGNAL)
  });
}

Unless there is a specific reason it was implemented with the custom kill behavior? I'd rather have lib_uv figure it out I guess.

But at the very least I think it should be listening to child.on('close', resolve) to call the resolver instead of the kill functions themselves. This will cause it to wait for the process to actually close, however long it takes before continuing on and making a new process.

EDIT: Nevermind 🤔 I see that its actually waiting for both the killer and the process in an outer kill function here https://github.com/gilamran/tsc-watch/blob/master/lib/runner.js#L21

So in that case I think that tsc-watch is operating correctly but its just a really confusing experience when the process handles SIGTERM and then doesn't exit for a while.

gilamran · 2020-05-10T12:05:54Z

What would you suggest I should change?
Please also note that we had an issue with killing processes in the past, take a look at #71

justinmchase · 2020-05-10T19:09:43Z

Well after looking at this pretty closely I think that tsc-watch is actually fine, not the one causing the "problem" for me. However, it did appear to be tsc-watch, so to reduce confusion I think it wouldn't hurt do maybe do these two things:

Have the killer write something to stdout indicating what is going on, e.g.

[2:05:19 PM] File change detected. Starting incremental compilation...
[2:05:19 PM] Found 0 errors. Watching for file changes.
[2:05:19 PM] Closing application...
[2:05:22 PM] Waiting for application to close...
[2:05:25 PM] Waiting for application to close...

After each 3s interval send a second, different signal such as 'SIGABORT'

These are both pretty optional, its not that tsc-watch has a bug its just that I couldn't tell where the issue was and it seemed like tsc-watch had the issue since it was controlling my process.

Thanks for listening.

gilamran · 2020-05-10T19:18:16Z

Sounds good, How can I reproduce the "hanging" situation?

justinmchase · 2020-05-11T14:00:57Z

in the app that is running if you implement a handler for 'SIGTERM' and never call process.exit() it will basically hang.

// index.js
setInterval(() => console.log('still alive...'), 3000)
process.on('SIGTERM', () => console.log('SIGTERM received')

Then tsc-watch will send the signal and wait for the process to exit which never happens.

Asarew · 2020-05-18T09:10:21Z

This might be related, when using npm run [...] as your --onSuccess handler you might not get the behaviour you are looking for: https://lisk.io/blog/development/why-we-stopped-using-npm-start-child-processes.
This is not an issue when you have the ps command available here because tsc-watch will just kill everything. i beleive that this behaviour should be consistent even when you don't have ps available. (like in a light weight docker containers)

hongy20 · 2020-08-21T13:21:30Z

I experienced similar problem, which can be reproduced with the following code:

// index.ts
const id = setInterval(() => console.log('still alive...', process.pid), 3000);
process.on('SIGTERM', async () => {
    console.log('SIGTERM received', process.pid);
    await new Promise(resolve => {
        setTimeout(() => {
            clearInterval(id);
            resolve();
        }, 500);
    });
});

Steps

Run command tsc-watch src/index.ts --outDir ./dist --noClear --onSuccess 'node ./dist/index.js';
Save index.ts multiple times in a short period of time;
Check the output in the terminal to see which processes are terminated and which ones are still alive;

Result, when save changes slowly

Result, when save changes fast

gilamran · 2020-08-21T13:30:01Z

How would you suggest fixing this?

hongy20 · 2020-08-21T13:46:00Z

Sorry, I'm not familiar with the code base here, but I'll suggest something like:

Do not start new process (--onSuccess COMMAND), until the existing one exits;
Only start one process (--onSuccess COMMAND), when there are multiple file changes detected within a short period of time;

I hope that make sense to you

justinmchase · 2020-08-21T13:53:51Z

There has been some developments on this issue by the way, in some other repos where I also filed this bug (not knowing exactly where its originating from) it appears that it may be related to this source bug:

sindresorhus/execa#433

Or possibly this library is experiencing a similar issue? Would be great if anyone here can add anything to that ticket.

justinmchase · 2020-08-21T13:55:27Z

May be related to nodejs/node#34830

hongy20 · 2020-08-21T20:38:22Z

For me, the problem is related with handling 'SIGTERM' event with async function. If I change the code to be sth like:

// index.ts
const id = setInterval(() => console.log('still alive...', process.pid), 3000);
process.on('SIGTERM', () => {
    console.log('SIGTERM received', process.pid);
    clearInterval(id);
});

Then the result is actually what I expect.

justinmchase · 2020-08-21T21:10:27Z

That's an interesting data point but even if you never called clearInterval, when the parent process closes this is supposed to be closed regardless of whether you handle SIGTERM or not, it should force it closed.

…so far.

david-alexander · 2022-03-22T04:06:52Z

I came across an issue today that seems to be related to this one, although slightly different.

In my case, the process does die in response to SIGTERM, but sometimes it can take a while. I haven't yet looked into the reason for the delay; maybe the process is doing some cleanup before it exits.

The problem is that if two "compilation complete" events happen in quick succession, the killProcesses() call from the first one may not have finished by the time the second one arrives. This means that killProcesses() will happen twice on the same process (with the second time being effectively a no-op) and then runOnCompilationComplete() will happen twice as well. So two new processes get started, and the first of them never gets killed.

I decided to have a go at fixing it, and have got a solution that seems to be working well, although I haven't had a chance to test it thoroughly yet. Before calling runOnCompilationComplete(), I check whether the original event that triggered this has been superseded by a later event. I've only implemented this for the "compilation complete" event so far, but I could easily do the other events as well.

@gilamran, if you're open to it, I'd be happy to tidy up my fix, do some more testing on it, and put together a pull request.

Incidentally, I also think we could improve the maintainability of this code by switching to async/await instead of Promise.then(), and would be happy to help with that too. (If you're worried about compatibility with older Node versions, we could add a build step that transpiles it down to ES5.)

Of course, my solution doesn't deal with the case where the process completely ignores SIGTERM, although it should at least stop it from "losing track" of the errant process. I guess a proper fix for that case would involve some sort of grace period (configurable?) and then a SIGKILL, as originally suggested by @justinmchase.

gilamran · 2022-03-22T18:22:27Z

@david-alexander regarding the issue you found, it sounds like something that needs to be fixed. Please do a PR and we'll talk about it there. (Thanks!)
Regarding async/await we've started a move to typescript on version 5 (not released yet) you can do your PR there.

taxilian · 2022-09-08T16:26:00Z

I'm having the same issue as @david-alexander where I end up with events happening too quickly and end up with multiple instances of the app running -- it's very frustrating :-/

Not meaning to complain, though, as this is still a great project and the best way (IMO) to use typescript in development; ts-node is too slow.

david-alexander · 2022-09-08T20:22:25Z

Sorry I still haven't got round to doing a PR with my fix for this.

@taxilian, have you tried the commit linked in my comment above to see if it fixes your issue?

…so far.

…lling processes after the last event. Fixes gilamran#87.

david-alexander · 2022-09-08T21:34:00Z

I have just taken another look at my fix today and have made a few changes to it - the new version is here.

Please note that I haven't yet had a chance to do any testing on this new commit, apart from running the existing automated tests. 8 of the tests are failing for me, but they were failing already on the latest commit from master, so I don't think it's anything to do with these changes. But I haven't added any new tests yet to verify that this issue is fixed, so YMMV. Testing is now done and it seems to have fixed the issue. Updated commit is in the PR.

…lling processes after the last event. Fixes gilamran#87.

divmgl · 2022-11-22T19:25:53Z

I'm actually running into this with pnpm. Looks like two processes are getting started when onSuccess occurs. Did we get these changes in yet?

gilamran · 2022-11-22T21:42:42Z

Can you please test with the next version?
npm install tsc-watch@dev

divmgl · 2022-11-22T22:29:36Z

@gilamran hey Gil, this was my bad. This stopped happening once I instrumented the app to properly respond to a SIGTERM signal. You can disregard my message, sorry about that.

WadePeterson · 2022-12-06T16:40:49Z

@gilamran, I believe I have been running into this same issue using [email protected] on two different Node services.

When running my services with tsc-watch, and then making a lot of code changes in rapid succession (in my case, from dependencies being recompiled by another process), the services will eventually start throwing EADDRINUSE errors because a previous instance didn't get shut down correctly when restarted via tsc-watch.

My problem appears to be resolved when using the latest [email protected]!

However, I did run into a new problem with the v6 version. Running on a Mac, and working in a repo using yarn, I am getting this error when running tsc-watch:

env: node\r: No such file or directory

I got around it using the workaround here, but it seems the issue is likely a Windows CRLF in the executable node script for tsc-watch.

gilamran · 2022-12-06T17:01:14Z

@WadePeterson Thanks for trying out version 6!
The error that you mentioned is something that happened before, I will look into it.
Thanks!

gilamran · 2022-12-08T16:25:33Z

@WadePeterson published new dev version. you can install it by npm i tsc-watch@dev and you should be ok now

gilamran · 2022-12-08T16:25:57Z

Also, if everything is ok I'll publish v6 soon.

WadePeterson · 2022-12-08T17:33:05Z

Just installed [email protected] and the CRLF issue is gone! Everything seems to work great for me now, and I still am not able to reproduce the orphaned process issue I was seeing in v5

gilamran · 2022-12-08T18:49:33Z

Version 6 published!

justinmchase mentioned this issue Aug 21, 2020

In the batteries-included apollo-server, ApolloServer.stop() can hang indefinitely waiting for connections to close (just like net.Server.close) apollographql/apollo-server#4097

Closed

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Mar 22, 2022

WIP on fix for gilamran#87. Only implemented for compilationComplete …

f039800

…so far.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Mar 22, 2022

WIP on fix for gilamran#87. Only implemented for compilationComplete …

935d7d2

…so far.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 8, 2022

WIP on fix for gilamran#87. Only implemented for compilationComplete …

fae9811

…so far.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 8, 2022

Handle cases where an event is received from tsc while we're still ki…

a513c14

…lling processes after the last event. Fixes gilamran#87.

david-alexander added a commit to david-alexander/tsc-watch that referenced this issue Sep 9, 2022

Handle cases where an event is received from tsc while we're still ki…

49243d7

…lling processes after the last event. Fixes gilamran#87.

david-alexander mentioned this issue Sep 9, 2022

Fix for #87 #152

Closed

gilamran closed this as completed Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Child processes that don't die after SIGTERM are orphaned #87

Child processes that don't die after SIGTERM are orphaned #87

justinmchase commented May 9, 2020

justinmchase commented May 9, 2020

justinmchase commented May 9, 2020 •

edited

Loading

gilamran commented May 10, 2020

justinmchase commented May 10, 2020

gilamran commented May 10, 2020

justinmchase commented May 11, 2020

Asarew commented May 18, 2020 •

edited

Loading

hongy20 commented Aug 21, 2020

gilamran commented Aug 21, 2020

hongy20 commented Aug 21, 2020 •

edited

Loading

justinmchase commented Aug 21, 2020

justinmchase commented Aug 21, 2020

hongy20 commented Aug 21, 2020

justinmchase commented Aug 21, 2020

david-alexander commented Mar 22, 2022 •

edited

Loading

gilamran commented Mar 22, 2022

taxilian commented Sep 8, 2022

david-alexander commented Sep 8, 2022

david-alexander commented Sep 8, 2022 •

edited

Loading

divmgl commented Nov 22, 2022

gilamran commented Nov 22, 2022

divmgl commented Nov 22, 2022

WadePeterson commented Dec 6, 2022

gilamran commented Dec 6, 2022

gilamran commented Dec 8, 2022

gilamran commented Dec 8, 2022

WadePeterson commented Dec 8, 2022

gilamran commented Dec 8, 2022

Child processes that don't die after SIGTERM are orphaned #87

Child processes that don't die after SIGTERM are orphaned #87

Comments

justinmchase commented May 9, 2020

justinmchase commented May 9, 2020

justinmchase commented May 9, 2020 • edited Loading

gilamran commented May 10, 2020

justinmchase commented May 10, 2020

gilamran commented May 10, 2020

justinmchase commented May 11, 2020

Asarew commented May 18, 2020 • edited Loading

hongy20 commented Aug 21, 2020

Steps

Result, when save changes slowly

gilamran commented Aug 21, 2020

hongy20 commented Aug 21, 2020 • edited Loading

justinmchase commented Aug 21, 2020

justinmchase commented Aug 21, 2020

hongy20 commented Aug 21, 2020

justinmchase commented Aug 21, 2020

david-alexander commented Mar 22, 2022 • edited Loading

gilamran commented Mar 22, 2022

taxilian commented Sep 8, 2022

david-alexander commented Sep 8, 2022

david-alexander commented Sep 8, 2022 • edited Loading

divmgl commented Nov 22, 2022

gilamran commented Nov 22, 2022

divmgl commented Nov 22, 2022

WadePeterson commented Dec 6, 2022

gilamran commented Dec 6, 2022

gilamran commented Dec 8, 2022

gilamran commented Dec 8, 2022

WadePeterson commented Dec 8, 2022

gilamran commented Dec 8, 2022

justinmchase commented May 9, 2020 •

edited

Loading

Asarew commented May 18, 2020 •

edited

Loading

hongy20 commented Aug 21, 2020 •

edited

Loading

david-alexander commented Mar 22, 2022 •

edited

Loading

david-alexander commented Sep 8, 2022 •

edited

Loading