-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add driver killing capabilities #297
Comments
Having the ability to restart drivers explicitly isn't a bad idea. However, I think to really address the underlying issue of drivers getting stuck, we probably need to make it happen automatically as you suggested in (1). I'm a little reluctant to implement (2) until it's demonstrated to be really necessary, because it encourages people to bake hacks into external scripts that will make it hard for us to debug issues in production. I'm not sure how the libcontainer plumbing affects this, but it should be possible to kill the driver process with a signal when the context times out. For an ordinary subprocess this is already implemented in the standard library; but we might need to do something extra to work around the container management. |
Totally agree. As discussed of Slack, we could kill the driver after |
I think this is related to #303 |
I assume, we all agreed, a new functionality ( |
I don't think a flag is necessarily required: The key behaviour is that when the context is cancelled, the daemon needs to terminate the driver container serving that request. Once we have that, we could of course add a flag to set the timeout—but the client can also set its own timeout without any software changes on our side. The missing feature right now is the termination step. |
I agree with @creachadair, the cancellation will be propagated to the Go driver server already, we only need to kill a native driver after receiving a cancellation signal. To be specific, here is the relevant piece of code: |
I've started working on this, but so far I mainly focused on adding here: select {
case <-ctx.Done():
pool.killDriver(drv)
pool.scale()
return nil, ctx.Err()
// ...
} Assuming that passed ctx. here and to the Parser will/may have a timeout which we can eventually get from command line flag. in |
@kuba-- I think it's worth adding to both sides, actually. In bblfshd we can notice the
The driver will notice the timeout through gRPC just fine. So we can try killing it on the native level first. |
I have found myself having a problem when trying to parse large amounts of files (~5k files). Basically what happens is that after parsing a large amount of files, the C++ driver stops responding completely, and the Parse request that triggered it, and all subsequent ones, timeout.
After some testing, I've realized that it seems that killing the running driver instance and relaunching it solves the problem, so I would like:
bblfshctl instances
, for example by doingbblfshctl instances --kill ccp
or evenbblfshctl instances --kill-all
The text was updated successfully, but these errors were encountered: