Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stt microphone live example #2254

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ChitranshS
Copy link

  • I understand that this repository is auto-generated and my pull request may not be merged

Changes being requested

This PR adds a real-time speech-to-text example script demonstrating how to use OpenAI's WebSocket-based transcription API. The script:

  1. Captures audio from the microphone in real-time
  2. Streams the audio data to OpenAI's transcription API via WebSockets
  3. Processes and displays transcription events as they occur
  4. Handles speech detection events (speech start/stop)
  5. Properly manages resources and connections

This example would be valuable for users who want to implement real-time transcription functionality in their applications using the OpenAI API.

Additional context & links

This implementation uses:

  • websockets for WebSocket communication
  • sounddevice for microphone input
  • numpy for audio data processing
  • pydantic for data validation and configuration

The script demonstrates best practices for real-time audio streaming and event handling with OpenAI's transcription API, including proper connection management, error handling, and resource cleanup.

@ChitranshS ChitranshS requested a review from a team as a code owner March 24, 2025 16:07
@wronkiew
Copy link

I was not able to get this to work. One thing I ran into is it requires websockets==10.1 for extra_headers support. But once running the transcription endpoint only returned a session update event, no transcribed text. Did you get more events? I confirmed it is recording good audio.
I got a variant based on this example to return transcription events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants