Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Pool of native drivers (prototype) #443

Closed
wants to merge 1 commit into from
Closed

Pool of native drivers (prototype) #443

wants to merge 1 commit into from

Conversation

kuba--
Copy link
Member

@kuba-- kuba-- commented Oct 29, 2019

Signed-off-by: kuba-- [email protected]

This is the first draft/prototype which implements a pool of native drivers (parsers).
Instead of having just one native driver we handle a pool of concurrent drivers, so for each grpc.Parse request we can pull the driver, write request and put them back to the pool.
The pool of drivers was implemented as buffered channel.

This PR closes bblfsh/bblfshd#321
and it's a part of epic: bblfsh/bblfshd#313

Some design details you can find in doc: https://docs.google.com/document/d/1671ApmxFCOmerVKloVC6Q6j8BsJr8xZHbhuE40ZK-as/edit?pli=1#


This change is Reviewable

@kuba-- kuba-- requested a review from a team October 29, 2019 21:33
@kuba-- kuba-- changed the title Poll of native drivers (prototype) Pool of native drivers (prototype) Oct 29, 2019
Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 3 files reviewed, all discussions resolved


driver/server/common.go, line 19 at r1 (raw file):

func Run(t driver.Transforms) {
	n := runtime.NumCPU()
	ch := make(chan driver.Native, n)

Did you consider updating the go doc as well, to reflect this change?
This also seems like an opportunity to make sure all the exported functions are documented.

Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


driver/server/common.go, line 19 at r1 (raw file):

Previously, bzz (Alexander) wrote…

Did you consider updating the go doc as well, to reflect this change?
This also seems like an opportunity to make sure all the exported functions are documented.

Yeah, we'll have to generally update API docs (it's on my todo).
This PR is just to quickly show the workflow.

@ncordon ncordon requested a review from a team October 30, 2019 12:01
@ncordon
Copy link
Member

ncordon commented Oct 30, 2019


driver/impl.go, line 43 at r2 (raw file):

// Start gets each driver from the channel, starts the process
// and put it back to the same buffer channel.

Feel free to ignore this nitpick: puts

@ncordon
Copy link
Member

ncordon commented Oct 30, 2019


driver/impl.go, line 59 at r2 (raw file):

// Close tries to close all idle drivers.
// If any of drivers fail on Close, the function wraps an error (err) and move on.

moves

@ncordon
Copy link
Member

ncordon commented Oct 30, 2019


driver/impl.go, line 47 at r3 (raw file):

// the function tries to close all running drivers and return an error.
func (d *driverImpl) Start() error {
	for i := 0; i < len(d.ch); i++ {

Does this start all processes even if we are not going to use them? I mean if I have 8 processors, am I going to start 8 processes even if I am just going to parse a file?

Copy link
Member

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r1, 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @kuba--)

@ncordon ncordon requested a review from a team October 30, 2019 12:38
Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ncordon)


driver/impl.go, line 44 at r1 (raw file):

		drv := <-d.ch
		if err := drv.Start(); err != nil {
			return err

Do you think it's OK to fail, if one of drivers failed on Start?


driver/impl.go, line 72 at r1 (raw file):

	}

	drv := <-d.ch

Is it ok, to block here? Maybe we may have some timeout, so if it'


driver/impl.go, line 43 at r2 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

Feel free to ignore this nitpick: puts

Done.


driver/impl.go, line 59 at r2 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

moves

Done.


driver/impl.go, line 47 at r3 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

Does this start all processes even if we are not going to use them? I mean if I have 8 processors, am I going to start 8 processes even if I am just going to parse a file?

Yes, it starts all parsers. Parsers will be idle mode waiting for I/O.
It decreases latency. Otherwise, for instance for C++ driver (where parser is written in java), you will have to wait on JVM to start before parsing any file.

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @kuba-- and @ncordon)


driver/impl.go, line 44 at r1 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Do you think it's OK to fail, if one of drivers failed on Start?

Yeah, I think it's reasonable, at least for now. Later we can improve it if it becomes a problem.


driver/impl.go, line 72 at r1 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Is it ok, to block here? Maybe we may have some timeout, so if it'

Yeah, you should select on 3 events: getting the driver, context cancellation, pool stop.


driver/impl.go, line 15 at r3 (raw file):

// NewDriverFrom returns a new DriverModule instance based on
// the given pool of native drivers, language manifest and list of transformers.
func NewDriverFrom(ch chan Native, m *manifest.Manifest, t Transforms) (DriverModule, error) {

Accepting a channel is a very inconvenient API. Let's accept func() Native and a max number of drivers here instead.


driver/impl.go, line 15 at r3 (raw file):

// NewDriverFrom returns a new DriverModule instance based on
// the given pool of native drivers, language manifest and list of transformers.
func NewDriverFrom(ch chan Native, m *manifest.Manifest, t Transforms) (DriverModule, error) {

Also, I think we need to implement scaling in the native package instead.

The implementation of driver.Native interface may accept multiple requests, and it does in case of Go driver. We don't need to scale that one, only ones created by driver/native package.


driver/impl.go, line 47 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Yes, it starts all parsers. Parsers will be idle mode waiting for I/O.
It decreases latency. Otherwise, for instance for C++ driver (where parser is written in java), you will have to wait on JVM to start before parsing any file.

@ncordon Also note that we do this to simplify things. In K8S we can disable the scaling here (set it to 1 driver) and let K8S scale whole drivers, not native ones.


driver/impl.go, line 66 at r3 (raw file):

			// TODO(kuba--): replace following lines by errors.Wrap (after migrating to go 1.13)
			if err != nil {
				err = fmt.Errorf("%s, %s", err.Error(), e.Error())

No need to concatenate, just return the last error. Also, it may be better to make it explicit in the code (var last error; for ... ; return last).

Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @dennwc and @ncordon)


driver/impl.go, line 72 at r1 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Yeah, you should select on 3 events: getting the driver, context cancellation, pool stop.

do you mean to add done channel to the struct and do:

select {
case <- d.done:
    return nil, nil

case <- ctx.Done():
     return nil, ctx.Err()

case drv := <- d.ch:
//... al the code here
}

and in Close function do: d.done <- struct{}{}
This is what you mean?
I'm not sure if having and extra done channel makes sense. If the request already arrived on Close, I would let him go and return a response.


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Accepting a channel is a very inconvenient API. Let's accept func() Native and a max number of drivers here instead.

So how this function will work - like iterator?
How you gonna put them back, if you cannot deal with channel?

The option is to wrap this channel with a new structure NativePool with Get, Put and New(size) methods.


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Also, I think we need to implement scaling in the native package instead.

The implementation of driver.Native interface may accept multiple requests, and it does in case of Go driver. We don't need to scale that one, only ones created by driver/native package.

But the native.go represents one particular driver/parser - single process, so why scaling here?
And impl.go is just a implementation of particular grpc requests (Parse, Version, Languages, ...). If some day you want to scale Parse2 or any other action then it's already there.


driver/impl.go, line 66 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

No need to concatenate, just return the last error. Also, it may be better to make it explicit in the code (var last error; for ... ; return last).

Done.

@ncordon
Copy link
Member

ncordon commented Oct 30, 2019


driver/impl.go, line 47 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

@ncordon Also note that we do this to simplify things. In K8S we can disable the scaling here (set it to 1 driver) and let K8S scale whole drivers, not native ones.

👌 Thanks for the clarification

Copy link
Member

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dennwc and @kuba--)

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a good idea. I share Denys's concern about the package API, but the approach seems fine.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dennwc and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

So how this function will work - like iterator?
How you gonna put them back, if you cannot deal with channel?

The option is to wrap this channel with a new structure NativePool with Get, Put and New(size) methods.

It's fine to use a channel for the implementation, but I recommend passing in either a struct with callbacks or an interface. The problem with a channel in the API is that it's not clear how the caller is supposed to set up the channel, and what assumptions there are about the discipline for reading and writing it, except in documentation. With an interface the usage is more explicit.

Just a made-up example:

type Pool interface {
    // Open blocks until a native driver is available, or until ctx terminates.
    Open(ctx context.Context) (Native, error)
    // Release adds or returns a driver to the pool.
    Release(Native)
}

// …

d, err := pool.Open(ctx)
if err != nil {
   doSomethingWith(err)
}
defer pool.Release(d)
doStuffWith(d)   

You could implement this with a channel, if you wanted.

type pool chan Native

func (p pool) Open(ctx context.Context) (Native, error) {
    select {
    case v, ok := <-p:
        if ok {
            return p, nil
        }
        return nil, errors.New("pool closed")
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}

func (p pool) Release(d Native) { p <- d }

But you would want to be careful probably to buffer the channel by at least 1 in either case, so that the pool does not deadlock on itself (this is true whether or not we wrap the type).


driver/impl.go, line 47 at r3 (raw file):

Previously, ncordon (Nacho Cordón) wrote…

👌 Thanks for the clarification

Also, idle processes are relatively cheap; as long as we're not giving them work, they will not chew up a lot of kernel resources and their resident sets can be paged out.

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @creachadair, @kuba--, and @ncordon)


driver/impl.go, line 72 at r1 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

do you mean to add done channel to the struct and do:

select {
case <- d.done:
    return nil, nil

case <- ctx.Done():
     return nil, ctx.Err()

case drv := <- d.ch:
//... al the code here
}

and in Close function do: d.done <- struct{}{}
This is what you mean?
I'm not sure if having and extra done channel makes sense. If the request already arrived on Close, I would let him go and return a response.

Yes, almost. Except that:

  • Call close(d.done) in Close method. Also make sure it's safe to call it twice (close usually panics if you don't protect it).
  • return nil, ErrDriverClosed when a signal from d.done is triggered.

To your point, you can't safely let the request go because Close will try to drain the channel and you may leak that driver that processes the request.


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

But the native.go represents one particular driver/parser - single process, so why scaling here?
And impl.go is just a implementation of particular grpc requests (Parse, Version, Languages, ...). If some day you want to scale Parse2 or any other action then it's already there.

You can make two separate struct types there - one will be with no scaling, and the second will scale the first one.

To your point, there is nothing in Parse/Parse2 that says that it only allows one request at a time. So always scaling driver.Native doesn't sound right.

The driver.Native is not an interface for drivers implemented via stdin/stdout. It's an interface for any driver that returns a native AST. It may or may not be single threaded.


driver/impl.go, line 15 at r3 (raw file):

Previously, creachadair (M. J. Fromberger) wrote…

It's fine to use a channel for the implementation, but I recommend passing in either a struct with callbacks or an interface. The problem with a channel in the API is that it's not clear how the caller is supposed to set up the channel, and what assumptions there are about the discipline for reading and writing it, except in documentation. With an interface the usage is more explicit.

Just a made-up example:

type Pool interface {
    // Open blocks until a native driver is available, or until ctx terminates.
    Open(ctx context.Context) (Native, error)
    // Release adds or returns a driver to the pool.
    Release(Native)
}

// …

d, err := pool.Open(ctx)
if err != nil {
   doSomethingWith(err)
}
defer pool.Release(d)
doStuffWith(d)   

You could implement this with a channel, if you wanted.

type pool chan Native

func (p pool) Open(ctx context.Context) (Native, error) {
    select {
    case v, ok := <-p:
        if ok {
            return p, nil
        }
        return nil, errors.New("pool closed")
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}

func (p pool) Release(d Native) { p <- d }

But you would want to be careful probably to buffer the channel by at least 1 in either case, so that the pool does not deadlock on itself (this is true whether or not we wrap the type).

I think a single callback function should be enough here. driver.Native is already an interface and it has a Start and Stop methods that can get or put an actual implementation from the pool (if it has one).


driver/impl.go, line 66 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Done.

One more nit here: you can now call the error err instead of e (according to the convention).

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dennwc, @kuba--, and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

I think a single callback function should be enough here. driver.Native is already an interface and it has a Start and Stop methods that can get or put an actual implementation from the pool (if it has one).

That would also work, though then we would have to add another layer of wrapper for the existing implementations. I worry that might be a little too clever (and harder to debug).

I don't feel very strongly about which direction we go—my main point is to advise against using channels in the API definition, since it's very easy for the caller of the API to screw up the send/receive discipline and cause deadlocks or crashes. A pretty good rule of thumb for channels is they should always be internal details.

(I realize there are some counterexamples like time.Ticker, but even there you see some subtleties that people often get wrong around leaking goroutines and unfinished ticks)

Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 5 unresolved discussions (waiting on @dennwc and @ncordon)


driver/impl.go, line 72 at r1 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

Yes, almost. Except that:

  • Call close(d.done) in Close method. Also make sure it's safe to call it twice (close usually panics if you don't protect it).
  • return nil, ErrDriverClosed when a signal from d.done is triggered.

To your point, you can't safely let the request go because Close will try to drain the channel and you may leak that driver that processes the request.

Done.


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

I think a single callback function should be enough here. driver.Native is already an interface and it has a Start and Stop methods that can get or put an actual implementation from the pool (if it has one).

I would prefer to implement (on top of buffer channel) sth. what Michael suggested.

Personally, I don't see the point of having just one func callback, and handle get and put by Start and Stop. Start and Stop for me are to start and stop the parser process. Both are called at the beginning and at the end. In other words, having get/put implemented by start/stop means that we'll have to start and stop parser process for each Parse request.


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

You can make two separate struct types there - one will be with no scaling, and the second will scale the first one.

To your point, there is nothing in Parse/Parse2 that says that it only allows one request at a time. So always scaling driver.Native doesn't sound right.

The driver.Native is not an interface for drivers implemented via stdin/stdout. It's an interface for any driver that returns a native AST. It may or may not be single threaded.

Honestly sounds a little bit overcomplicated. For me, this file is a pure implementation of grpc requests. So, we pull clients from the pool to talk to the backend (like usually happens with all DB stuff, but in this case the backend is running parser process and communication goes over stdin/out).

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 4 unresolved discussions (waiting on @creachadair, @kuba--, and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Honestly sounds a little bit overcomplicated. For me, this file is a pure implementation of grpc requests. So, we pull clients from the pool to talk to the backend (like usually happens with all DB stuff, but in this case the backend is running parser process and communication goes over stdin/out).

I feel we are talking past each other.

Yes you are right, driver.Native is an analog of gRPC, but gRPC always assumes concurrency. The point I'm trying to make is that driver.Native (as an interface) is concurrent. And we have at least one implementation that is concurrent - Go driver.

So instead of stating that "every driver.Native needs a scaler to run concurrently", as your implementation implies, I suggested to hide this scaler behind the interface. And do this only for a single implementation that requires this kind of scaling - the one that runs over stdin/stdout (driver/native.Driver). It shouldn't be on a high level (driver package), since concurrency for other implementations can and will be implemented differently.

To illustrate the concept better, imagine Python driver that is implemented as a binding to C library of Python interpreter. Instead of starting N interpreters, and keeping them on hold you can just create/clone one on each Parse request, for example.

So the scaling you have implemented should be internal to a driver/native package, not external (in root driver package) as is in this PR.


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

I would prefer to implement (on top of buffer channel) sth. what Michael suggested.

Personally, I don't see the point of having just one func callback, and handle get and put by Start and Stop. Start and Stop for me are to start and stop the parser process. Both are called at the beginning and at the end. In other words, having get/put implemented by start/stop means that we'll have to start and stop parser process for each Parse request.

I think you heavily misunderstood me.

Start can wrap whatever is necessary for an implementation. In case of your implementation it will start N instances, and Stop will stop N instances. I don't understand why you say that Start should be called before each Parse.

I mentioned getting a driver from the pool because Michael mentioned that we may need an interface instead of a single callback. We don't, since driver.Native that is returned by the callback can use the pool internally, in it makes sense for an implementation. As an example I mentioned that Start may get N drivers from the pool, if we think that it's necessary.

So TL;DR is (IMO) that we already have an interface called native.Driver. No need for another wrapper on top of it.


driver/impl.go, line 62 at r5 (raw file):

// If any of drivers fail on Close, the function returns the last received error.
func (d *driverImpl) Close() error {
	close(d.done)
select {
case <-d.done:
default:
  close(d.done)
}

So it won't panic if called twice.

Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 4 unresolved discussions (waiting on @dennwc and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

I think you heavily misunderstood me.

Start can wrap whatever is necessary for an implementation. In case of your implementation it will start N instances, and Stop will stop N instances. I don't understand why you say that Start should be called before each Parse.

I mentioned getting a driver from the pool because Michael mentioned that we may need an interface instead of a single callback. We don't, since driver.Native that is returned by the callback can use the pool internally, in it makes sense for an implementation. As an example I mentioned that Start may get N drivers from the pool, if we think that it's necessary.

So TL;DR is (IMO) that we already have an interface called native.Driver. No need for another wrapper on top of it.

Of course, if we implement it in such a way, it will work. But I wonder if the whole bootstrap won't be too heavy. Ok, right now we hold all running processes, but thanks to this you get immediately the answer. If we create a process on demand we'll have to wait for JVM (C++ driver) or python interpreter (python driver) and so on..
Do you think, it's ok?


driver/impl.go, line 15 at r3 (raw file):

Previously, dennwc (Denys Smirnov) wrote…

I feel we are talking past each other.

Yes you are right, driver.Native is an analog of gRPC, but gRPC always assumes concurrency. The point I'm trying to make is that driver.Native (as an interface) is concurrent. And we have at least one implementation that is concurrent - Go driver.

So instead of stating that "every driver.Native needs a scaler to run concurrently", as your implementation implies, I suggested to hide this scaler behind the interface. And do this only for a single implementation that requires this kind of scaling - the one that runs over stdin/stdout (driver/native.Driver). It shouldn't be on a high level (driver package), since concurrency for other implementations can and will be implemented differently.

To illustrate the concept better, imagine Python driver that is implemented as a binding to C library of Python interpreter. Instead of starting N interpreters, and keeping them on hold you can just create/clone one on each Parse request, for example.

So the scaling you have implemented should be internal to a driver/native package, not external (in root driver package) as is in this PR.

OK, I got, your point. Of course we can simplify things here.
In this draft I had two goals:

  • Don't change current API much, so it will be somehow transparent and easy to read/understand (because of minimum number of changes).
  • I decided that if we have fixed number of concurrent processes (instead of create them on demand), it's better to have them running and responsive (also look at my comment above).

driver/impl.go, line 62 at r5 (raw file):

Previously, dennwc (Denys Smirnov) wrote…
select {
case <-d.done:
default:
  close(d.done)
}

So it won't panic if called twice.

Done.

Copy link
Member Author

@kuba-- kuba-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 4 unresolved discussions (waiting on @dennwc and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

OK, I got, your point. Of course we can simplify things here.
In this draft I had two goals:

  • Don't change current API much, so it will be somehow transparent and easy to read/understand (because of minimum number of changes).
  • I decided that if we have fixed number of concurrent processes (instead of create them on demand), it's better to have them running and responsive (also look at my comment above).

One more thing. I think it's a drawback that we have sdk, but implement certain drivers differently (like go is concurrent, but others not). IMHO for distributed systems consistent workflow is a key.
Anyway, current implementation does not force anyone to use any concurrency. You can still pass own pool (size 1) and do it by yourself.

Of course we can add another interface like:

type Pool struct {
       Open() Native
       Close() error
}

make DriverModule as a composition of Pool, Module, Driver,
and let drivers implement it, but it will require much more work across all drivers and I wonder if we can really benefit from this flexibility.
Mainly because as we discussed this should be a tiny feature as a part of bigger refactoring, and any advanced scalability will be pushed to k8s.

Copy link
Member

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 4 files reviewed, 3 unresolved discussions (waiting on @creachadair, @kuba--, and @ncordon)


driver/impl.go, line 15 at r3 (raw file):

Previously, kuba-- (Kuba Podgórski) wrote…

Of course, if we implement it in such a way, it will work. But I wonder if the whole bootstrap won't be too heavy. Ok, right now we hold all running processes, but thanks to this you get immediately the answer. If we create a process on demand we'll have to wait for JVM (C++ driver) or python interpreter (python driver) and so on..
Do you think, it's ok?

Those are two different cases, and I specifically emphasized in both comments that the scaling is always implementation-dependent.

For Go driver you can just call the Parse function. Start and Stop do nothing.

For Python example, Start and Stop probably still do nothing and Parse may start a new interpreter. Or maybe it gets those interpreters from the pool. It depends on the performance characteristics.

For JVM we can't embed it, so we use stdin/stdout, so we need additional scaling. In this case Start creates N processes, Parse picks one of them.

For all 3 cases all the details are hidden behind driver.Native interface. That's the point I'm trying to make. For each implementation there may be a better way to achieve concurrency. For stdin/stdout we decided to start N processes. But it may not be the same for others.

So let's move the implementation where it belongs (driver/native package).

Also, to clarify, the example about Python is purely hypothetical. We won't implement it this way, it will still use stdin/stdout. But this is important to keep in mind for the future.


driver/impl.go, line 15 at r3 (raw file):

In this draft I had two goals:

Don't change current API much

But you are changing the API. This is exactly what I'm arguing about.

Go driver can run concurrently with no additional changes required. Your change, on the other hand, always adds scaling on top of it.

This is an API change because the expected behavior of the system changes. You are keeping driver.Native intact, but with a different set of assumptions: it's must not be concurrent and will be scaled externally. This restriction is not the case for a the current version of the driver package.

To clarify the last part, I'm speaking specifically about the driver.Native interface, e.g. if you use bblfshd as a library (which @kuba-- you were always an advocate for :)). bblfshd-the-app always scales drivers, but this is not because of the interface in the driver package, but rather because of how the app is designed.

So let's properly separate responsibilities for the implementations and not fall into the same trap as current bblfshd, which is to rely on a single implementation (libcontainer in case of bblfshd, and stdin/stdout as in case of this PR).

I decided that if we have fixed number of concurrent processes

This is still a valid assumption and I was not arguing about this part here or in the other thread. Sorry if my comment to Michael confused you, when I mentioned the pool. It was solely to show that driver.Native is capable of hiding the process pool, scaling on demand or any other implementation detail.

So let's assume this topic is resolved. We are running exactly N processes, no scaling on demand for this PR, as we both agreed previously.

@kuba-- kuba-- closed this Mar 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prototype native parser scaling mechanism
5 participants