Replies: 2 comments 4 replies
-
These kind of issue lay somewhere within the AI model (is my best understanding). AllTalk certainly passes the text over to the AI model correctly (from all the tests I've performed in the past), however, occasional skips or even double speaking a word (usually at the end of a sentence) does occur from time to time. My loose general findings/beliefs thus far are:
So its possible that further finetuning may improve the situation, or indeed a different wav sample. Long term, I may introduce other TTS models, giving a variety of method/ways for generating TTS. Whisper could be an option to compare generated audio to the text and regenerate or flag where necessary. |
Beta Was this translation helpful? Give feedback.
-
I've managed to code something together that will at least ease the burden. Saying that I want to be clear to anyone reading this, this is a currently UNSUPPORTED work in progress. I have created a proof of concept that will compare the spoken audio generated (by ID number), to the original text it was requested to generate. When the script runs, it will flag up a list of "ID number didn't match the text". You will need the Nvidia CUDA Toolkit 11.8 setup like with Finetuning https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-important-requirements-cuda-118 You will need to update AllTalk with a
As I say, this is a proof of concept. I have not tested its limits, found all the issues and of course it isn't integrated into the TTS Generator. Thanks |
Beta Was this translation helpful? Give feedback.
-
I'm currently using Alltalk tts to generate tts for audiobooks.
I'm noticing that every so often, sentences just get truncated for no apparent reason. I'm using default settings in the tts generator (2 chunks). I am using a finetuned model, so could this truncation have to do with how I trained the model (maybe a bad dataset)?
Beta Was this translation helpful? Give feedback.
All reactions