Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input sample rate locked at 16000? #1441

Open
PreAmbience opened this issue Jan 27, 2025 · 9 comments
Open

Input sample rate locked at 16000? #1441

PreAmbience opened this issue Jan 27, 2025 · 9 comments

Comments

@PreAmbience
Copy link

I've been trying out the latest version of v1 (1.5.3.18a) and v2 (2.0.73-beta).
While I do like some of the changes, I noticed a marginal decrease in voice quality from v1 to v2, particularly on S and T sounds.
I run everything, from input to output, on a sample rate of 48000 in server mode in v1 with good results through Voicemeeter.
In v2, however, the option to set the sample rate is no longer there.

Looking at the Voice Changer Info in v2, I saw the following entries:

"input_sample_rate": 44100
"output_sample_rate": 44100

"vc_input_sample_rate": 16000
"vc_output_sample_rate": 48000

"input_sample_rate": 16000
"output_sample_rate": 48000

No options anywhere to change these in the UI or troubleshoot.

Wouldn't such a low sample rate on the input be detrimental to output quality?
Why am I unable to set the target sample rate in v2?

@Kuuko-fokkusugaru
Copy link

Since the software is a "voice changer" and not a "voice morpher", the input quality of the audio hasn't any impact on the output quality as long as this doesn't add extra artifacts or noise. Basically, whatever sound enters the software, will be transformed into a new sound. Voicemeeter, just like vb cable, is not recommended for use because it can lead to lower-quality audio and other issues.

@PreAmbience
Copy link
Author

Thanks for clarifying, though I was already aware of the distinction.
If it's not distorted input causing the issues, what does?

I unfortunately need to use Voicemeeter since it's the only software I'm aware of that can stream the audio to another PC.
If the voice changer itself supports such functionality (since it's technically a client/server setup) I haven't found how.

It also begs the question why v1 produces crystal clear output with Voicemeeter and v2 doesn't.
WASAPI is entirely unusable in v2.
This is on Windows 10 using RVC2 and ONNX voice models.

Of course the solution is simple for me. I'll hold off on "upgrading" until I see improved support for these things.
Still, I would love to be able to use the newer versions.

@Kuuko-fokkusugaru
Copy link

Kuuko-fokkusugaru commented Jan 28, 2025

I personally kept using v1 over v2 for similar reasons. That said, many people reported improvements on the quality of their models using v2 over v1. Perhaps, it depends on the model itself. There were an user that affirmed that changing the crossfade setting gave them the quality of v1 in v2. I just didn't bothered with it because it's hard to measure with accuracy the quality of the output unless the difference is extreme, which isn't the case for my model.
In regards to stream the audio to a different computer, that's precisely the main purpose of RVC. If you run start_https.bat instead of the regular http one, you can then open the local ip that the console gives you in a different computer in the same network which will give you the RVC UI in your browser. If you use server mode, you may need to connect your mic in the server computer, if you use client, you can use the mic right away in the main computer. This is helpful mainly so you put all the RVC performance weight on a secondary computer while using the main one freely like for gaming. This should help you on not having to rely on virtual cables that could possibly be problematic. Keep in mind that, while vb cable and all their related software are prone to cause sound issues with certain host apps, this does not happens always. The reason why I recommend VAC instead is because it never causes issues while the others often does even if not always.

And about the quality issue. It's hard to say. It's hard for me to guess what your issue could be. It could be glitch sound, stutter, noise, audio quality downing, voice model wrong phonemes, voice model noises, etc. I haven't heard a comparison between what you consider good quality and bad quality like to tell you where the cause could be so I simply made general suggestions for the most common issues around all users. If you want, you can provide example clips of how it sounds good and how it sounds bad, that would be my only way to know what could be. But if you do so, don't use a mic to speak a sentence to test. Instead, record an audio of yourself speaking in a software like Audacity or similar then use the setting for audio file in RVC as input. In that way, the input will be always the same audio file and we can more accurately compare the resulting output quality if you, for example, are comparing v1 vs v2. If you don't use an audio file and just speak on both, the result won't be consistent and the comparison won't be valid since you can always get different results speaking the same sentence (because it will never come out exactly the same every time).

@Kuuko-fokkusugaru
Copy link

By the way, I forgot to mention that WASAPI issues in v2 were supposed to be fixed as it didn't use to work before. Not sure if your issue is that it doesn't work or it just sounds bad.

@PreAmbience
Copy link
Author

Thank you for all that information!

Glad to see that the network functionality is supported. I'll try it out!
Still not entirely sure how I would use it for realtime communication without feeding the "changed" voice to a virtual device, however. Just reading your reply, you seemed to imply this could be done without "virtual cables"? I understand VAC is the recommended solution, but can it even be done without any? Perhaps a misunderstanding on my part.

The overall quality difference between v1 and v2 is small enough to where it could be because of dynamic volume, if I'm not mistaken. It reminds me of distortion from too high gain, which is why I might have confused it with low sample rate. Of course I have adjusted the gain and heard no improvement. v2 sounds generally harsher and more artificial with all the models I've tried.
I'll certainly play around with the settings more and see what I can manage before I dive deeper into it and provide audio samples.

WASAPI together with my current setup (RVC2 model, Win10, Voicemeeter with VBAN, VC v1) produces fast and crisp audio on the receiving end. When used with v2 the audio is choppy and inconsistent, as if buffering. It's not constant, issues appearing every few seconds. Reminds me of voice chat on very low bandwidth or packet loss.

MME doesn't suffer from this but seems to generally perform worse with variable delay between input and output over time. Now and then it cuts off, as if "catching up" to its original delay again.

I'll play around with the settings more, try the built in network features and return with the results.

@PreAmbience
Copy link
Author

Okay, so I did some testing. Here are some initial findings.

Tried using the voice changer over the browser. I kept the settings the same as when I ran with Voicemeeter.

  • v1 has identical quality to Voicemeeter but an unreasonably long delay making it unsuitable for realtime use.
  • v2 has a much shorter delay but the voices are instead pitched up in a comical manner, regardless of pitch setting or voice model.
    Kudos to the software for keeping things interesting =)

The perks of using Voicemeeter was that I could run the microphone and voice changer on the other PC and get processed audio directly onto my gaming PC, cutting out one potential source of delays. There doesn't seem to be an option for having the microphone attached to the server PC and still streaming the processed audio to the client?

@Kuuko-fokkusugaru
Copy link

Yes, you would still need a virtual cable but you wouldn't need to rely in Voicemeeter to stream to the other computer.
As for the delay, I haven't any noticeable delay when using the software on a second computer (no more than the one from the buffer). I have both computers attached to the same router with cable, not Wi-Fi, so the transmission is pretty much instant.
And yes, you can use your mic in the server pc like I mentioned already in my previous comment. That's what's the server option for. Set it to client and you will get the audio sources from the computer running the browser UI. Set it to server and you will get the ones from the computer running RVC.

About the rest of your issues, not sure. Dynamic volume can be changed in the settings and, the last time I tried, it's set to off by default so it works like v1. Also, this setting only has effect on the output, not the input. It tries to match the differences of the input volume on the output volume giving a more realistic and less monotone result.

@PreAmbience
Copy link
Author

I'm also running ethernet cable. Can't say why the delay was so much higher. Client mode has always felt slower on v1 to me.

Figured a virtual device was necessary but thanks for clarifying.

Yes, the server option gives me the option of selecting inputs and outputs on the server PC but there is no way to get that audio out on the client PC from what I can tell. It's either one or the other. Tell me if I'm missing something obvious here.

I must have misunderstood the function of dynamic volume. Never mind that then.

I'll keep testing things.

@Kuuko-fokkusugaru
Copy link

When using server mode, you get the audio by directly connecting the exit to the client computer as if it were a 3.5mm mic. So you would simply use a regular cable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants