-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worse performance of large model compared to small model? #54
Comments
the test set is test-clean of LibriTTS, and the number of samples is 4833 |
Due to the significant increase in generalization capabilities of large models, I observed a slight performance drop on the LibriTTS test-clean dataset (though the difference is minimal). However, your results may also be influenced by other factors, such as cuda version, and it seems that four entries are missing from your test set. Moreover, subject evaluation may be also important. Thank you~ |
thanks for your reply, I am using the small model to reconstruct the wavforms. UTMOS_raw 19604.11721920967 4.056303997353543 |
ok, It appears that the results exhibit some variation about different metrics. |
Thank you for doing such great work and open-sourcing it.
I use the large model (WavTokenizer-large-320-24k-4096) to reconstruct audio of LibriTTS.
However, the results are worse than those reported in paper, which used the small model.
It is
UTMOS_raw 19604.11721920967 4.056303997353543
UTMOS_encodec 19604.11721920967 3.8397375189096272
PESQ: 9956.64894938469 2.060138412866685
F1_score: 4432.935466635334 0.917602042358794 2
STOI: 0.8924008398453133
While in paper, it is
UTMOS_encodec 4.0486
PESQ 2.3730
STOI 0.9139
Is it exceptd for the performance to degrade?
Thanks~
The text was updated successfully, but these errors were encountered: