You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe there is a bug in the handling of kill -QUIT that causes the manager to hang after all the processes are reaped. The issue is that the trap for kill -QUIT passes a zero as the flag to waitpid2, which is sensible since we do want to wait for all the child processes to end before shutting down the manager. Unfortunately this causes a hang after all the children have finished because a call waitpid2 with a flag value of zero and no living children will never return. It seems an assumption was made that this call would just return a nil pid after all children have ended but that is not the case per the Ruby docs and some experimentation. (In fact, the Ruby docs say that a waitpid2 with no children can result in a SystemError on some platforms, but this does not seem to be the case on OSX or Centos for me)
I think this code needs to be rewritten to instead wait for a known set of child pids to report that they are dead and then exit the loop. For now, kill -INT works just fine for my purposes (I'm ok with god starting up a new master before all old workers are finished)
The text was updated successfully, but these errors were encountered:
I believe there is a bug in the handling of kill -QUIT that causes the manager to hang after all the processes are reaped. The issue is that the trap for kill -QUIT passes a zero as the flag to waitpid2, which is sensible since we do want to wait for all the child processes to end before shutting down the manager. Unfortunately this causes a hang after all the children have finished because a call waitpid2 with a flag value of zero and no living children will never return. It seems an assumption was made that this call would just return a nil pid after all children have ended but that is not the case per the Ruby docs and some experimentation. (In fact, the Ruby docs say that a waitpid2 with no children can result in a SystemError on some platforms, but this does not seem to be the case on OSX or Centos for me)
I think this code needs to be rewritten to instead wait for a known set of child pids to report that they are dead and then exit the loop. For now, kill -INT works just fine for my purposes (I'm ok with god starting up a new master before all old workers are finished)
The text was updated successfully, but these errors were encountered: