-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcription on device (phone) #1249
Comments
I agree, also Whisper is the way to go. It's on device performance (I only used it on an iPhone 12 mini, to develop something simple) is truly incredible. It should be downloaded on demand, 'cause including it in the bundle would be a terrible idea. |
tell me more about your experience with Whisper + iPHone 12 Mini pls @Ronuhz / such as transcripts quality, speeds, battery draining. |
Here is a little demo running on an iPhone 12 mini, iOS 18.2 Beta 3, model: Whisper Tiny using the Neural Engine for both decoding and encoding. The voice is streamed to the model in real time. Everything runs locally. output.mp4 |
In a native app using Swift and SwiftUI it takes about 10-20 minutes to get this implemented using WhisperKit. In Flutter I don't know. |
Found this with a quick search but it does not support transcribing in real-time |
ok let's make this happen We need to make omi FULLY LOCAL - fully local transcription Might be in React Native (i don't care about the stack) Bounty is $20k I will lock it on whoever will show the best MVP |
💎 $20,000 bounty • omiSteps to solve:
Thank you for contributing to BasedHardware/omi! Add a bounty • Share on socials
|
@kodjima33 how to lay hands on the omi hardware? |
/attempt #1249
|
@yuvrajjsingh0 The problem with using the platform's own STT is that then you won't have Speaker separation. For Whisper Tiny you need less then a GB of VRAM and storage. It should be downloaded on-demand and NOT be included in the bundle. It can be ran on the ANE on Apple Devices at least, sadly I can't speak about Android because it's not my area of expertise. |
@kodjima33 Okay, if we want to use Whisper, do we need this transcription thing in real-time? Or we'll be doing it on saved audio? There is an option to use a Voice Recognition model on the voice that will tell us who is speaking at what timeframe and use STT to transcribe it. |
/attempt #1249 Why Whisper Tiny? ANE on iOS: WhisperKit (Swift) taps into Apple’s Neural Engine. Battery drain is minimal compared to CPU-only inference. Demo here—got it working in a test app with real-time streaming. Supports Multiple Languages. Avoid app bloat: Ship the model (~150MB) via CDN (Hugging Face Hub?) post-install. No need to bake it into the bundle. Alternatives I tested (and why they suck): Platform STT (Android/iOS APIs): Distil-Whisper/Hugging Face models: Larger Whisper models (Base/Medium): Implementation Plan Android: Option B: Transformers Android (Java), but might need model quantization. Speaker Diarization Hack: Using Whisper Tiny on the device is possible. The trade-offs are a slightly bigger app size after downloading and some tweaks needed for speaker identification. But it's worth it for better privacy and lower server costs. @Ronuhz , I saw that you're working on Whisper Tiny. Let me know if you're open to collaborating on this. Options |
what about using https://github.com/mediar-ai/screenpipe/tree/main/screenpipe-audio it's pure rust, meaning you can make it mobile friendly easily |
We all know that we need local transcription.
Both Bitalik Buterin and George Hotz said it when trying out our tech.
Creating this issue to aggregate feedback and prepare for the switch gradually
The text was updated successfully, but these errors were encountered: