Support SIGHUP to abort running model for API servers #1690

spirilis · 2023-06-04T13:54:03Z

spirilis
Jun 4, 2023

I asked this as an Issue over in the llama-cpp-python project - abetlen/llama-cpp-python#313 - one minor issue I see is that when running llama.cpp as a library behind an API server, if a client decides to terminate connection the model seems to keep running.

This could be handled better by having the llama.cpp backend code support something like SIGHUP (Hangup - perfect analogy for this as the client "hung up") and returning a null result, thereby allowing the API server the ability to serve another query immediately after.

Any thoughts? Does llama.cpp already support this in another manner & we just need to find/implement it in the python?

bullno1 · 2023-06-08T05:19:46Z

bullno1
Jun 8, 2023

Sounds more like a binding problem. The cpp API (in llama.h) can already only predict a single word at a time.

Multiple word prediction is a loop that is handled by application code.

0 replies

msj121 · 2023-06-09T01:42:36Z

msj121
Jun 9, 2023

AFAIK I think if you are willing to "stream" then if you stop requesting next-token, it will indeed stop. ie: if you simply don't request the next-token it won't do additional processing.

0 replies

spirilis · 2023-06-11T00:41:09Z

spirilis
Jun 11, 2023
Author

Thanks; I'll look into llama-cpp-python's bindings with the API then, and see what I can do from there.

0 replies

msj121 · 2023-06-11T21:55:52Z

msj121
Jun 11, 2023

@spirilis I think that the llama-cpp-python binding works differently in that it doesn't call the server example which has "next-token". I looked at https://github.com/keldenl/gpt-llama.cpp/blob/master/routes/chatRoutes.js and for example they kill the "main" object class file when there is a closure. So each project might handle this differently.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SIGHUP to abort running model for API servers #1690

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Support SIGHUP to abort running model for API servers #1690

spirilis Jun 4, 2023

Replies: 4 comments

bullno1 Jun 8, 2023

msj121 Jun 9, 2023

spirilis Jun 11, 2023 Author

msj121 Jun 11, 2023

spirilis
Jun 4, 2023

bullno1
Jun 8, 2023

msj121
Jun 9, 2023

spirilis
Jun 11, 2023
Author

msj121
Jun 11, 2023