[llama-server] Please show me how to increase stream updates frequency #9207

TyraVex · 2024-08-27T21:35:59Z

TyraVex
Aug 27, 2024

Hello,

I did not find anything related to this question online, and I am surprised about it, so maybe this question is a bit dumb without me realizing it.

The issue is related to how the server example streams a response, more specifically the number of updates it provides about new tokens. Currently, on any UI I use with a llama.cpp server backend, the streamed response is sent as chunks of 50 or so tokens. Why not send an update on every token like any other API?

If I want to chat with a big model with slow inference speed, I wait a few minutes before receiving anything, which is surely not intentional. And when I receive something, it's a group of around 50 tokens. Not practical :/

The llama-cli example seems to write token by token well on a terminal, and since the server example is based on this inference code, I wonder what I am doing wrong or not doing in order to end up with a half-streamed response.

Any help is appreciated.
Thanks to the developers for this amazing open source project 🙏🏼

Answered by TyraVex

Aug 28, 2024

I did a bit more research, and was able to locate the issue. It was nginx's fault. Adding the following to my configuration solved my issue:

proxy_buffering off;
proxy_cache off;

View full answer

ExtReMLapin · 2024-08-28T09:50:31Z

ExtReMLapin
Aug 28, 2024

On my end, manually starting llama.cpp server with command line, it's updated on each tokens, stream being sent in body of the POST request.

0 replies

TyraVex · 2024-08-28T10:08:07Z

TyraVex
Aug 28, 2024
Author

I did a bit more research, and was able to locate the issue. It was nginx's fault. Adding the following to my configuration solved my issue:

proxy_buffering off;
proxy_cache off;

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llama-server] Please show me how to increase stream updates frequency #9207

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[llama-server] Please show me how to increase stream updates frequency #9207

TyraVex Aug 27, 2024

Replies: 2 comments

ExtReMLapin Aug 28, 2024

TyraVex Aug 28, 2024 Author

TyraVex
Aug 27, 2024

ExtReMLapin
Aug 28, 2024

TyraVex
Aug 28, 2024
Author