Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New to using Kaldi, just need a model to extract good voice embeddings in a python script from .wav files #4944

Open
PhilipAmadasun opened this issue Sep 18, 2024 · 2 comments

Comments

@PhilipAmadasun
Copy link

Does anyone have an example python script that uses one on the x-vector extraction models developed here to extract embeddings? I've gone through some of the repo and have not found any such thing.

I've tried other pre-trained embedding models like that from pyannote embeddings but the extracted vectors were not very accurate representations of speakers when scrutini9zed with cosine similarity (A lot of false positives and negatives).

I'm still testing an embedding model from speech brain but would love to try that developed in kaldi as it was recommended to me.

I would be very grateful for any help in this matter.

@csukuangfj
Copy link
Contributor

Please have a look at next-gen Kaldi.

You can find PYTHON examples at

https://github.com/k2-fsa/sherpa-onnx/blob/master/python-api-examples/speaker-identification.py

and at
https://github.com/k2-fsa/sherpa-onnx/tree/master/python-api-examples

(Search for filenames containing the string speaker)

@csukuangfj
Copy link
Contributor

Note: All you need to install sherpa-onnx is run

pip install sherpa-onnx

It supports Linux (arm64, arm32, x64), Windows (x64, x86, arm64), macOS (x64, arm64), etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants