Speed up initial delay to start generating #216

RenNagasaki · 2024-05-10T14:48:40Z

RenNagasaki
May 10, 2024

Hey @erew123 I'm still playing around with how I'd get the best results in generating.
I've got two main ideas:

Generate a whole dialogue at once (since its streaming) but this adds a increasing delay the longer the dialogue. (It seems somewhere the text gets worked with before generating, which ends up taking longer the longer the text)
Generate the dialoge sentence for sentence. The moment the last sentence is done playing, the next one gets requested. But there I've got the problem that each request has its own start delay since alltalk (or coqui) takes around 0.5-1 seconds to start responding for each sentence)

Have you got any idea how to speed up the starting delay?

Kind regards,
Ren

erew123 · 2024-05-10T15:19:50Z

erew123
May 10, 2024
Maintainer

@RenNagasaki This is quite a complex one to answer.

Different hardware will always have some impact.

Obviously having lowvram enabled will cause a delay as it has to shift the model between GPU VRAM and System RAM in-between generations, so disabling that can speed up generation start and of course, depending on how fast your Systems RAM, PCI bus etc, that will also define how quick using lowvram is on your system. Disabling lowvram will shave off 0.5 to 2 seconds of time (depending on hardware).

I'm going to assume you have DeepSpeed installed and so you get the benefit of that.

So lets assume you have lowvram disabled so the model remains in your VRAM all the time. the next thing speed wise is obviously firing the data/text though Python to the generation process and loading your chosen wav file, but nothing much you can do there to speed that up and Im sure that measures in microseconds and there's nothing can be done here anyway.

Other than that, there are 2x things you could try playing with, but I cant speak to their effects. In tts_server.py:

line 530 common_args["stream_chunk_size"] = 20

This sets the minimum chunk sizes to create. Setting this smaller will result in the first generation coming out faster, however, it will increase processing overheard on the GPU. You could try setting this to a smaller number and see how it behaves.

line 544 vfout.writeframes(b"")

This line generates an empty frame at the start of the generation. Its used to send a warm up frame to the receiving audio player (for want of a better description*. You can remove this line, resulting in a very small silence at the start of a generation. Rough maths in my head tells me that its only adding something like 0.001 of a second to the audio, so it will probably have no noticeable impact.

However, those are the only things I can immediately think that could be played with.

Maybe have a play and see what they do! common_args["stream_chunk_size" is probably your best best!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up initial delay to start generating #216

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Speed up initial delay to start generating #216

RenNagasaki May 10, 2024

Replies: 1 comment

erew123 May 10, 2024 Maintainer

RenNagasaki
May 10, 2024

erew123
May 10, 2024
Maintainer