Better user experience for llamacpp-server #10012

PierreCarceller · 2024-10-23T08:35:20Z

PierreCarceller
Oct 23, 2024

Hello!

I'm a llamacpp-server user. In particular the openai apis.

In a dream world I would like :

Use a "message list" to interact with my LLM because this way of doing things is simple and human readable.
Be able to finely control generation with a regex, a choice list, a grammar, etc.

A bit like what you can do at VLLM

I can think of 2 alternative solutions that are a little less practical but easier to set up.

Solution 1 : Create a /message-format endpoint that would apply the template chat to the list of messages sent and then use the /completion endpoint that already exists.

Solution 2 : Let the client format the message list on its own (with jinja for example) before using the /completion endpoint. But in this case, the server must provide access to the information needed to do the job on the client side (the BOS token, the EOS token...etc).

I hope I haven't missed any important information.

ggerganov · 2024-10-23T10:02:44Z

ggerganov
Oct 23, 2024
Maintainer

We already support the /chat/completions endpoint that should be pretty much what you need. Have you tried it or is it missing some specific functionality?
We already support GBNF grammar for constrained generation.

Let me know if you need more help or if you have a specific example that you would like demonstrated.

5 replies

PierreCarceller Oct 23, 2024
Author

First of all, thank you for your reply and more generally for the incredible work you do.

"We already support the /chat/completions endpoint" -> Yes, and that's what I'm currently using by passing JSON schemas.

{
      "type": "json_object",
     "schema": Model.model_json_schema(),
}

But if I want to use a custom grammar, for example, I can't use the /chat/completions endpoint. Or at least I don't see it documented as supported. here

ExtReMLapin Oct 24, 2024

Not everything is documented, but I assure you it works, just give a try

ggerganov Oct 24, 2024
Maintainer

@ExtReMLapin Can you show an example of using grammar with the chat endpoint? I haven't tried this combination yet, so not sure if it is working as expected. Will look into it soon though.

ExtReMLapin Oct 24, 2024

@ggerganov
input_normal.json
output_normal.json
input_chat.json
output_chat.json

Soum-Soum Oct 25, 2024

Glad to read that! That's great news! Should update the ReadMe! I'll do my tests and propose a PR to update all this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better user experience for llamacpp-server #10012

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Better user experience for llamacpp-server #10012

PierreCarceller Oct 23, 2024

Replies: 1 comment · 5 replies

ggerganov Oct 23, 2024 Maintainer

PierreCarceller Oct 23, 2024 Author

ExtReMLapin Oct 24, 2024

ggerganov Oct 24, 2024 Maintainer

ExtReMLapin Oct 24, 2024

Soum-Soum Oct 25, 2024

PierreCarceller
Oct 23, 2024

Replies: 1 comment 5 replies

ggerganov
Oct 23, 2024
Maintainer

PierreCarceller Oct 23, 2024
Author

ggerganov Oct 24, 2024
Maintainer