Control the speed of the voice an duration of pause between words/sentences #181

kaosbeat · 2024-04-21T19:07:50Z

kaosbeat
Apr 21, 2024

Hi,

Is there a way, preferably when making an API request, to lower the duration of pauses in between words and or sentences?

The voices are rather slow sometimes, triggering an "end of speech detected" when I STT the TTS text

Answered by erew123

Apr 22, 2024

Hi @kaosbeat

The only real way to speed up pauses etc, that I know of with the XTTS model is to use a reference sample WAV that speaks faster. So if you have the same person speaking slower the generated text speaks slower (from anecdotal evidence) and faster when the reference sample wav is spoken a bit faster.

Beyond that, removing commas, semi-commas etc will remove pauses.

There is a generation speed, which speeds up the whole of the TTS generation, though Ive not played with it much myself.

https://docs.coqui.ai/en/latest/models/xtts.html#inference-parameters

You could manually introduce speed into tts_server.py by adding "speed": 1.6, (or a number of your choosing) by placing it in …

View full answer

erew123 · 2024-04-22T10:47:04Z

erew123
Apr 22, 2024
Maintainer

Hi @kaosbeat

The only real way to speed up pauses etc, that I know of with the XTTS model is to use a reference sample WAV that speaks faster. So if you have the same person speaking slower the generated text speaks slower (from anecdotal evidence) and faster when the reference sample wav is spoken a bit faster.

Beyond that, removing commas, semi-commas etc will remove pauses.

There is a generation speed, which speeds up the whole of the TTS generation, though Ive not played with it much myself.

https://docs.coqui.ai/en/latest/models/xtts.html#inference-parameters

You could manually introduce speed into tts_server.py by adding "speed": 1.6, (or a number of your choosing) by placing it in line 515

        # Common arguments for both functions
        common_args = {
            "text": text,
            "language": language,
            "gpt_cond_latent": gpt_cond_latent,
            "speaker_embedding": speaker_embedding,
            "temperature": float(temperature),
            "length_penalty": float(model.config.length_penalty),
            "repetition_penalty": float(repetition_penalty),
            "speed": 1.6,
            "top_k": int(model.config.top_k),
            "top_p": float(model.config.top_p),
            "enable_text_splitting": True
        }

However, do note what it says in the Coqui documents about (can produce artifacts if far from 1.0)

Thanks

1 reply

kaosbeat Apr 22, 2024
Author

Changing the "speed" to "1.3" Was adequate for me, thanks for pointing it out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control the speed of the voice an duration of pause between words/sentences #181

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Control the speed of the voice an duration of pause between words/sentences #181

kaosbeat Apr 21, 2024

Replies: 1 comment · 1 reply

erew123 Apr 22, 2024 Maintainer

kaosbeat Apr 22, 2024 Author

kaosbeat
Apr 21, 2024

Replies: 1 comment 1 reply

erew123
Apr 22, 2024
Maintainer

kaosbeat Apr 22, 2024
Author