turbopot is a versatile tool that offers both a command-line interface and a web server for AI-powered content generation and speech synthesis. It leverages OpenAI's GPT models for text generation and text-to-speech capabilities, providing a range of functionalities that can be easily integrated into various projects.
Repository: https://github.com/htelsiz/turbopot.git
- π AI-powered content generation for various types (e.g., blogs, poems, stories) using GPT models
- π£οΈ Text-to-speech conversion with multiple voice options using OpenAI's TTS models
- ποΈ High-quality audio generation option for enhanced speech output
- π₯οΈ Command-line interface for quick content generation and audio synthesis
- π Web API for integration into other applications, built with FastAPI
- ποΈ Audio transcription from files or microphone input using OpenAI's Whisper model
turbopot uses several key components to provide its functionality:
- Content Generation: Uses OpenAI's GPT models to generate text content based on user prompts and specified content types.
- Text-to-Speech: Converts generated text to speech using OpenAI's TTS models, with options for different voices and quality levels.
- Audio Transcription: Employs OpenAI's Whisper model to transcribe audio files or live microphone input.
- Streaming Responses: Implements asynchronous streaming for both text and audio generation, allowing for real-time output.
- CLI and Web API: Offers both a command-line interface and a FastAPI-based web server for flexible usage.
ContentGenerator
class inapi.py
: Handles interaction with OpenAI's API for text generation, speech synthesis, and transcription.generate_spoken_content_stream
function: Orchestrates the process of generating content and converting it to speech.- FastAPI routes in
main.py
: Provide web API endpoints for content generation and transcription. - Typer CLI commands in
main.py
: Offer command-line functionality for various features.
- User input (prompt, content type, etc.) is received via CLI or API.
- The input is processed and sent to OpenAI's GPT model for text generation.
- Generated text is streamed back and optionally sent to OpenAI's TTS model for speech synthesis.
- Audio data is streamed back and can be played or saved to a file.
- For transcription, audio input is processed by OpenAI's Whisper model to produce text output.
This architecture allows for efficient, stream-based processing of large amounts of text and audio data, making turbopot suitable for various applications requiring AI-generated content and speech synthesis.
Before using TurboPot, you need to have the following installed on your system:
- FFmpeg: For playing and processing audio content.
- PortAudio: Required for microphone input functionality.
brew install ffmpeg
- Download FFmpeg from https://ffmpeg.org/download.html#build-windows
- Extract to a location (e.g., C:\ffmpeg)
- Add the bin folder to your system PATH
sudo apt update && sudo apt install ffmpeg
brew install portaudio
PortAudio is included with the Python package sounddevice
, which is listed in the requirements.txt file.
sudo apt update && sudo apt install libportaudio2
-
Clone the repository:
git clone https://github.com/htelsiz/turbopot.git cd turbopot
-
Install dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key: Create a
.env
file in the project root and add your API key:OPENAI_API_KEY=your_key_here
Generate content:
python main.py generate-content --subject "Artificial Intelligence" --type "blog" --voice "nova" --high-quality
python main.py generate-content --subject "Dolphins" --type "scientific overview" --voice "fable" --high-quality --max-length 256
python main.py generate-content --subject "the lines at the post office" --type "meme idea" --voice "shimmer" --high-quality --max-length 256
Options:
--subject
: The topic for your content (required)--type
: Type of content to generate (default: "general")--voice
: Voice for text-to-speech (default: "alloy")--high-quality
: Use high-quality audio generation (flag)--output
: Save the generated audio to a file--max-length
: Maximum number of characters for the generated content
Transcribe audio file:
python main.py transcribe-audio /path/to/your/audio/file.mp3
Record audio and transcribe:
python main.py record-and-transcribe --duration 10
Options:
--duration
: Duration of recording in seconds (default: 5)--sample_rate
: Sample rate of the recording (default: 44100)
Start the server:
python main.py run-server
Access the API documentation at http://127.0.0.1:8000/docs
Endpoints:
/generate_content
: Generate content and audio/transcribe
: Transcribe an uploaded audio file
This project is licensed under the MIT License.
Contributions, issues, and feature requests are welcome! Feel free to check issues page.
Got questions? Too bad!