Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HasSpeech is always true #6

Closed
hunterwebapps opened this issue Apr 2, 2021 · 4 comments
Closed

HasSpeech is always true #6

hunterwebapps opened this issue Apr 2, 2021 · 4 comments

Comments

@hunterwebapps
Copy link

Hello,

I've tried everything that I can think of. I have a very simple implementation here. Really hoping to get some advice. This is going to be a life saver library for my project.

I'm passing in a 16khz, mono channel wav file, codec used was pcm_s16le.

I'm on version 1.3.1, testing on Windows 10 Build 19042

using var vad = new WebRtcVad()
{
    OperatingMode = OperatingMode.Aggressive,
    FrameLength = FrameLength.Is20ms,
    SampleRate = SampleRate.Is16kHz,
};

// I tried with * 1 instead of * 2 here as well, since the wav I'm using is mono channel
var frameSize = (int)vad.SampleRate / 1000 * 2 * (int)vad.FrameLength;

var audioBytes = await File.ReadAllBytesAsync("birds.wav");

for (var i = 0; i < audioBytes.Length - frameSize; i += frameSize)
{
    var hasSpeech = vad.HasSpeech(audioBytes.Skip(i).Take(frameSize).ToArray());

    if (hasSpeech)
    {
        // inspecting with breakpoint here, always hits on first pass, when there is no speech.
        break;
    }
}
@ladenedge
Copy link
Owner

ladenedge commented Apr 2, 2021

WebRTC doesn't work with WAV files directly -- it only works with raw audio. So while your codec looks good, the WAV file is going to include some metadata about that audio that WebRTC doesn't understand. You'll need to send it the audio within the WAV container by either:

  • manually converting your WAV file to PCM/RAW with, eg, FFMpeg or Audacity, or
  • (perhaps better) filter your audio through a library like NAudio which can read that WAV metadata and provide you the proper raw audio stream.

Here's some untested sample code that should get you started with the latter approach:

using var vad = new WebRtcVad();
using var audio = new WaveFileReader(wavAudio);
var fmt = audio.WaveFormat;
var frameBytes = FrameLength.Is20ms * fmt.SampleRate / 1000 * fmt.Channels * fmt.BitsPerSample / 8;
var audioData = new byte[frameBytes];
while (true)
{
   if (await audio.ReadAsync(audioData.AsMemory()) != audioData.Length)
      break;
   var hasSpeech = vad.HasSpeech(audioData);
}

Good luck!

@hunterwebapps
Copy link
Author

I'll try out the raw file today. Sounds like this will definitely solve it. Thank you!

@redbosse
Copy link

redbosse commented Nov 8, 2022

how to pass an array of float32 format to it?

@ladenedge
Copy link
Owner

You'll need to convert your 32-bit IEEE floats to Linear 16-bit PCM. NAudio can do this if you use .NET, otherwise it looks like you might be able to adapt someone's manual conversion code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants