Replies: 2 comments
-
I think this is very important to have this option available and exposed via the Python API, because it can easily lead to crashes due to re-entrancy problems (in fact these crashes are easily reproducible on slower, CPU-only systems) and now the app developer seems to be unable to prevent them (by stopping or timeouting the ongoing inference which may take forever to end on its own). |
Beta Was this translation helpful? Give feedback.
-
On python, (not using server rest api), it's simple, when you create a stream generation, just inside the stream loop, catch if there is a stop order and exit the function. |
Beta Was this translation helpful? Give feedback.
-
I am hoping to find a way to stop an ongoing inference / prediction process started. So that in case of this example:
msg 1: "user: hello"
msg 2: "user: who are you"
I would like to be able to stop the proces that is started with the 'hello' input to free up the resources and instead then send:
"user: hello
user: who are you"
As one message.
I have been trying to find out which proces I should target for this usecase, and than perhaps have a uid - boolean pointer or something that should stop that process (i assume its a recursion loop somewhere) if the pointer is set to true.
Now eyeballing
llama_get_logits_ith
for that. But I'm a noob, so not sure, and perhaps there is an easier way to achieve what I need!Any help / feedback will be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions