Transcription on device (phone) #1249

kodjima33 · 2024-11-04T09:12:22Z

We all know that we need local transcription.

Both Bitalik Buterin and George Hotz said it when trying out our tech.

Creating this issue to aggregate feedback and prepare for the switch gradually

Ronuhz · 2024-11-05T16:39:44Z

I agree, also Whisper is the way to go. It's on device performance (I only used it on an iPhone 12 mini, to develop something simple) is truly incredible. It should be downloaded on demand, 'cause including it in the bundle would be a terrible idea.

beastoin · 2024-11-12T04:38:31Z

tell me more about your experience with Whisper + iPHone 12 Mini pls @Ronuhz / such as transcripts quality, speeds, battery draining.

Ronuhz · 2024-11-12T12:03:57Z

tell me more about your experience with Whisper + iPHone 12 Mini pls @Ronuhz / such as transcripts quality, speeds, battery draining.

Here is a little demo running on an iPhone 12 mini, iOS 18.2 Beta 3, model: Whisper Tiny using the  Neural Engine for both decoding and encoding. The voice is streamed to the model in real time. Everything runs locally.

output.mp4

Ronuhz · 2024-11-12T12:09:15Z

In a native app using Swift and SwiftUI it takes about 10-20 minutes to get this implemented using WhisperKit. In Flutter I don't know.

mdmohsin7 · 2024-11-15T04:51:50Z

Found this with a quick search but it does not support transcribing in real-time

https://pub.dev/packages/whisper_flutter_plus

kodjima33 · 2025-02-14T09:07:40Z

ok let's make this happen

We need to make omi FULLY LOCAL - fully local transcription

Might be in React Native (i don't care about the stack)

Bounty is $20k

I will lock it on whoever will show the best MVP
/bounty $20000

algora-pbc · 2025-02-14T09:09:01Z

💎 $20,000 bounty • omi

Steps to solve:

Start working: Comment /attempt #1249 with your implementation plan
Submit work: Create a pull request including /claim #1249 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to BasedHardware/omi!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @yuvrajjsingh0	Feb 15, 2025, 10:06:15 PM	WIP
🟢 @Ritesh2351235	Feb 17, 2025, 4:53:22 AM	WIP

ayewo · 2025-02-14T09:37:14Z

@kodjima33 how to lay hands on the omi hardware?

yuvrajjsingh0 · 2025-02-15T22:06:13Z

/attempt #1249
Hi, if we are doing it on device, I'd suggest using Device's default speech to text functionality as that is Hardware accelerated and optimized for that device. It's available for both iOS and Android, also it can do it in real time.
I will make use of speech to text of device. Using whisper is fine, but whisper is an LLM based model and is really big which can bloat the application and using it on low end devices will make the app suffer with crashes. I have previously worked with integrating Tesseract on Android devices natively and from that experience I can say that using whisper locally is never an option as it will only work well on high end devices.
@kodjima33
Here's a sample app I created in Flutter and its demo in iOS:
https://github.com/user-attachments/assets/6511fc7a-7c15-433e-a8c5-79870658e270

Algora profile	Completed bounties	Tech	Active attempts	Options
@yuvrajjsingh0	1 bounty from 1 project	PureBasic		Cancel attempt

Ronuhz · 2025-02-16T08:05:09Z

@yuvrajjsingh0 The problem with using the platform's own STT is that then you won't have Speaker separation. For Whisper Tiny you need less then a GB of VRAM and storage. It should be downloaded on-demand and NOT be included in the bundle. It can be ran on the ANE on Apple Devices at least, sadly I can't speak about Android because it's not my area of expertise.

yuvrajjsingh0 · 2025-02-16T09:11:59Z

@kodjima33 Okay, if we want to use Whisper, do we need this transcription thing in real-time? Or we'll be doing it on saved audio?

There is an option to use a Voice Recognition model on the voice that will tell us who is speaking at what timeframe and use STT to transcribe it.

Ritesh2351235 · 2025-02-17T04:53:19Z

/attempt #1249
Hey @kodjima33, here is my take on the local transcription for Omi.

Why Whisper Tiny?
Mobile-first: Tiny (39M params) is built for edge devices. I ran tests on an iPhone 11 ~150-300ms per audio chunk, no server calls. For Android, TFLite/MediaPipe can handle it, though we’ll need to optimize GPU delegation for weaker devices.

ANE on iOS: WhisperKit (Swift) taps into Apple’s Neural Engine. Battery drain is minimal compared to CPU-only inference. Demo here—got it working in a test app with real-time streaming.

Supports Multiple Languages.

Avoid app bloat: Ship the model (~150MB) via CDN (Hugging Face Hub?) post-install. No need to bake it into the bundle.

Alternatives I tested (and why they suck):

Platform STT (Android/iOS APIs):
Pros: Zero latency, free.
Cons: No speaker diarization, struggles with accents/background noise. Tried it—accuracy tanks in noisy environments.

Distil-Whisper/Hugging Face models:
Smaller, but multilingual support is spotty. Whisper Tiny handles 100+ languages out of the box.

Larger Whisper models (Base/Medium):
Overkill. Medium needs ~5GB RAM—not happening on phones.

Implementation Plan
iOS:
Use WhisperKit (Swift) for ANE-accelerated inference. Wrote a PoC—it’s ~20 lines of Swift to hook into mic input and stream to the model.

Android:
Option A: MediaPipe’s TFLite build (C++ → Kotlin/JNI).

Option B: Transformers Android (Java), but might need model quantization.

Speaker Diarization Hack:
Whisper doesn’t do this natively. Workaround: Add Silero VAD to detect pauses/speaker changes. Not perfect, but gets us 80% there without cloud calls.

Using Whisper Tiny on the device is possible. The trade-offs are a slightly bigger app size after downloading and some tweaks needed for speaker identification. But it's worth it for better privacy and lower server costs.

@Ronuhz , I saw that you're working on Whisper Tiny. Let me know if you're open to collaborating on this.

Options

Cancel my attempt

louis030195 · 2025-02-17T16:24:11Z

what about using https://github.com/mediar-ai/screenpipe/tree/main/screenpipe-audio

it's pure rust, meaning you can make it mobile friendly easily

kodjima33 moved this to Backlog in omi TODO / bounties Nov 4, 2024

kodjima33 added this to omi TODO / bounties Nov 4, 2024

kodjima33 changed the title ~~Local transcription~~ Transcription on device (phone) Nov 4, 2024

algora-pbc bot added the 💎 Bounty label Feb 14, 2025

kodjima33 added Paid Bounty 💰 flutter flutter work backend Backend Task (python) labels Feb 14, 2025

kodjima33 moved this to Someday in omi TODO / bounties Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcription on device (phone) #1249

Transcription on device (phone) #1249

kodjima33 commented Nov 4, 2024

Ronuhz commented Nov 5, 2024 •

edited

Loading

beastoin commented Nov 12, 2024

Ronuhz commented Nov 12, 2024 •

edited

Loading

Ronuhz commented Nov 12, 2024

mdmohsin7 commented Nov 15, 2024 •

edited

Loading

kodjima33 commented Feb 14, 2025 •

edited

Loading

algora-pbc bot commented Feb 14, 2025 •

edited

Loading

ayewo commented Feb 14, 2025

yuvrajjsingh0 commented Feb 15, 2025 •

edited

Loading

Ronuhz commented Feb 16, 2025

yuvrajjsingh0 commented Feb 16, 2025

Ritesh2351235 commented Feb 17, 2025 •

edited

Loading

louis030195 commented Feb 17, 2025

Transcription on device (phone) #1249

Transcription on device (phone) #1249

Comments

kodjima33 commented Nov 4, 2024

Ronuhz commented Nov 5, 2024 • edited Loading

beastoin commented Nov 12, 2024

Ronuhz commented Nov 12, 2024 • edited Loading

Ronuhz commented Nov 12, 2024

mdmohsin7 commented Nov 15, 2024 • edited Loading

kodjima33 commented Feb 14, 2025 • edited Loading

algora-pbc bot commented Feb 14, 2025 • edited Loading

💎 $20,000 bounty • omi

Steps to solve:

ayewo commented Feb 14, 2025

yuvrajjsingh0 commented Feb 15, 2025 • edited Loading

Ronuhz commented Feb 16, 2025

yuvrajjsingh0 commented Feb 16, 2025

Ritesh2351235 commented Feb 17, 2025 • edited Loading

louis030195 commented Feb 17, 2025

Ronuhz commented Nov 5, 2024 •

edited

Loading

Ronuhz commented Nov 12, 2024 •

edited

Loading

mdmohsin7 commented Nov 15, 2024 •

edited

Loading

kodjima33 commented Feb 14, 2025 •

edited

Loading

algora-pbc bot commented Feb 14, 2025 •

edited

Loading

yuvrajjsingh0 commented Feb 15, 2025 •

edited

Loading

Ritesh2351235 commented Feb 17, 2025 •

edited

Loading