How to stop ongoing inference process #6489

paulrouge · 2024-04-04T16:41:08Z

paulrouge
Apr 4, 2024

I am hoping to find a way to stop an ongoing inference / prediction process started. So that in case of this example:

msg 1: "user: hello"
msg 2: "user: who are you"

I would like to be able to stop the proces that is started with the 'hello' input to free up the resources and instead then send:

"user: hello
user: who are you"

As one message.

I have been trying to find out which proces I should target for this usecase, and than perhaps have a uid - boolean pointer or something that should stop that process (i assume its a recursion loop somewhere) if the pointer is set to true.

Now eyeballing llama_get_logits_ith for that. But I'm a noob, so not sure, and perhaps there is an easier way to achieve what I need!

Any help / feedback will be greatly appreciated!

mirekphd · 2024-04-16T11:02:34Z

mirekphd
Apr 16, 2024

I think this is very important to have this option available and exposed via the Python API, because it can easily lead to crashes due to re-entrancy problems (in fact these crashes are easily reproducible on slower, CPU-only systems) and now the app developer seems to be unable to prevent them (by stopping or timeouting the ongoing inference which may take forever to end on its own).

0 replies

ExtReMLapin · 2024-08-26T05:33:17Z

ExtReMLapin
Aug 26, 2024

On python, (not using server rest api), it's simple, when you create a stream generation, just inside the stream loop, catch if there is a stop order and exit the function.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stop ongoing inference process #6489

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to stop ongoing inference process #6489

paulrouge Apr 4, 2024

Replies: 2 comments

mirekphd Apr 16, 2024

ExtReMLapin Aug 26, 2024

paulrouge
Apr 4, 2024

mirekphd
Apr 16, 2024

ExtReMLapin
Aug 26, 2024