-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve speech recognition and remove postprocessing #837
Comments
@josancamon19 can you pls specify what languages are required to complete the task? This will allow me to quicker understand whom to ask to do it |
Note had an issue with speechmatics diarization results, hunch that it is still better than deepgram. Average WER Table
Average DER Table
How was this computed
In Considering that, we will use
Still Postprocessing: From the results, there's no benefit on using fal whisperx, even tho some results were almost as good as groq-whisper-large-v3, with something like 1% WER and DER, it's still very unreliable, at sometimes outputs 20% of the expected transcript, or outputs non-sense. Thus |
👋 |
Asked for credits to speechmatics 3 times, no response, will keep bothering, not much we can do for now. |
Remaining ticket will be here: https://github.com/orgs/BasedHardware/projects/1/views/1?pane=issue&itemId=81004351 |
Refactoring STT system
https://artificialanalysis.ai/speech-to-text
Points to https://www.speechmatics.com/ as the winner in WER %
Deepgram has a worst WER by 40%, which it's forcing us to do a postprocessing using whisper-x.
Also tried assembly AI, unfortunately streaming only works for english language, so it's discarded.
Speechmatics is marginally better than assembly ai, but works with all languages, and has interesting features future proof.
NOTE I will do the exact same pipeline first in Soniox first, we already have 10k in credits, but I'm unsure if I trust their accuracy for some reason, as the WER comparison was made by themselves.
Also they made the research before the releases of latest models.
Still the reason of testing soniox first, is because we have already a good % of the pipeline integrated, so it shouldn't take long.
Important:
Need to double check scalability(no response)Need to ask for free credits, it's 4x more expensive than deepgram.(no response).Speechmatics will only be supported for opus, for 1.0.2, will continue using deepgram.
Add ons:
.wav
instead of the saved opus encoded bytes), double check the duration at which performs 90% of the time.The text was updated successfully, but these errors were encountered: