Add driver killing capabilities #297

r0mainK · 2019-06-28T14:38:29Z

I have found myself having a problem when trying to parse large amounts of files (~5k files). Basically what happens is that after parsing a large amount of files, the C++ driver stops responding completely, and the Parse request that triggered it, and all subsequent ones, timeout.

After some testing, I've realized that it seems that killing the running driver instance and relaunching it solves the problem, so I would like:

This process to be automatically done when there is a user or parse timeout, possibly with a flag when launching the docker image or an environment constant.
To be able to kill specific instances by using bblfshctl instances, for example by doing bblfshctl instances --kill ccp or even bblfshctl instances --kill-all

The text was updated successfully, but these errors were encountered:

creachadair · 2019-06-29T01:52:23Z

Having the ability to restart drivers explicitly isn't a bad idea. However, I think to really address the underlying issue of drivers getting stuck, we probably need to make it happen automatically as you suggested in (1). I'm a little reluctant to implement (2) until it's demonstrated to be really necessary, because it encourages people to bake hacks into external scripts that will make it hard for us to debug issues in production.

I'm not sure how the libcontainer plumbing affects this, but it should be possible to kill the driver process with a signal when the context times out. For an ordinary subprocess this is already implemented in the standard library; but we might need to do something extra to work around the container management.

dennwc · 2019-06-29T14:21:57Z

Totally agree. As discussed of Slack, we could kill the driver after userTimeout + parseTimeout, this shouldn't affect the pool too much.

vmarkovtsev · 2019-07-31T13:33:29Z

I think this is related to #303

kuba-- · 2019-09-11T18:23:41Z

I assume, we all agreed, a new functionality (--timeout argument) should be added to each driver, so bblfshctl can pass it.

creachadair · 2019-09-11T18:49:15Z

I don't think a flag is necessarily required: The key behaviour is that when the context is cancelled, the daemon needs to terminate the driver container serving that request. Once we have that, we could of course add a flag to set the timeout—but the client can also set its own timeout without any software changes on our side. The missing feature right now is the termination step.

dennwc · 2019-09-12T10:16:54Z

I agree with @creachadair, the cancellation will be propagated to the Go driver server already, we only need to kill a native driver after receiving a cancellation signal.

To be specific, here is the relevant piece of code:
https://github.com/bblfsh/sdk/blob/master/driver/native/native.go#L255
We already stop waiting for the response if we see a timeout, but we also need to terminate the native driver to avoid further processing.

kuba-- · 2019-09-12T10:26:00Z

I've started working on this, but so far I mainly focused on adding here:
https://github.com/bblfsh/bblfshd/blob/master/daemon/service.go#L75
some kind of:

select {
   case <-ctx.Done():
      		pool.killDriver(drv)
		pool.scale()
                return nil, ctx.Err()
  // ...
}

Assuming that passed ctx. here and to the Parser will/may have a timeout which we can eventually get from command line flag.
WDYT?

in native.go we are inside a driver, but we need to catch the timeout, outside a driver to be able to restart it.

dennwc · 2019-09-12T10:35:36Z

@kuba-- I think it's worth adding to both sides, actually. In bblfshd we can notice the parseTimeout, wait a bit more (userTimeout), and if request is still blocked - kill the driver completely.

but we need to catch the timeout, outside a driver

The driver will notice the timeout through gRPC just fine. So we can try killing it on the native level first.

dennwc added the help wanted label Jun 28, 2019

kuba-- self-assigned this Sep 11, 2019

kuba-- mentioned this issue Sep 12, 2019

Kill driver on parse ctx.Done #306

Merged

dennwc closed this as completed in #306 Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add driver killing capabilities #297

Add driver killing capabilities #297

r0mainK commented Jun 28, 2019

creachadair commented Jun 29, 2019

dennwc commented Jun 29, 2019

vmarkovtsev commented Jul 31, 2019

kuba-- commented Sep 11, 2019

creachadair commented Sep 11, 2019

dennwc commented Sep 12, 2019

kuba-- commented Sep 12, 2019 •

edited

Loading

dennwc commented Sep 12, 2019

Add driver killing capabilities #297

Add driver killing capabilities #297

Comments

r0mainK commented Jun 28, 2019

creachadair commented Jun 29, 2019

dennwc commented Jun 29, 2019

vmarkovtsev commented Jul 31, 2019

kuba-- commented Sep 11, 2019

creachadair commented Sep 11, 2019

dennwc commented Sep 12, 2019

kuba-- commented Sep 12, 2019 • edited Loading

dennwc commented Sep 12, 2019

kuba-- commented Sep 12, 2019 •

edited

Loading