This project demonstrates the use of OpenAI's Realtime API to create an AI assistant capable of handling voice input, performing various tasks, and providing audio responses. It showcases the integration of tools, structured output responses, and real-time interaction.
- Real-time voice interaction with an AI assistant
- Asynchronous audio input and output handling
- Custom tools execution based on user requests
- Synchronous Communication: Direct, immediate interaction with agents for quick tasks
- Asynchronous Task Delegation: Long-running task delegation to agencies/agents
- Send messages to agency CEOs without waiting for responses
- Send messages to subordinate agents on behalf of CEOs
- Task Status Monitoring: Check completion status and retrieve responses
- Multiple specialized AI agent teams working collaboratively
- Google Calendar integration for meeting schedule management
- Gmail integration for email handling and drafting
- Browser interaction for web-related tasks
- File system operations (create, update, delete)
-
SendMessage: Synchronous communication with agencies/agents for quick tasks
- Direct interaction with immediate response
- Suitable for simple, fast-completing tasks
-
SendMessageAsync: Asynchronous task delegation
- Initiates long-running tasks without waiting
- Returns immediately to allow other operations
-
GetResponse: Task status and response retrieval
- Checks completion status of async tasks
- Retrieves agent responses when tasks complete
- FetchDailyMeetingSchedule: Fetches and formats the user's daily meeting schedule from Google Calendar
- GetGmailSummary: Provides a concise summary of unread Gmail messages from the past 48 hours
- DraftGmail: Composes email drafts, either as a reply to an email from GetGmailSummary, or as a new message
- GetScreenDescription: Captures and analyzes the current screen content for the assistant
- FileOps:
- CreateFile: Generates new files with user-specified content
- UpdateFile: Modifies existing files with new content
- DeleteFile: Removes specified files from the system
- OpenBrowser: Launches a web browser with a given URL
- GetCurrentDateTime: Retrieves and reports the current date and time
- Install Python 3.12.
- Install uv, a modern Python package manager
- Clone this repository to your local machine
- Create a local environment file
.env
based on.env.sample
- Customize
personalization.json
andconfig.py
to your preferences - Install the required audio library:
brew install portaudio
- Install project dependencies:
uv sync
- Launch the assistant:
uv run main
To enable Google Cloud API integration, follow these steps:
- Create OAuth 2.0 Client IDs in the Google Cloud Console
- Place the
credentials.json
file in the project's root directory - Configure
http://localhost:8080/
as an Authorized Redirect URI in your Google Cloud project settings - Set the OAuth consent screen to "Internal" user type
- Enable the following APIs and scopes in your Google Cloud project:
- Gmail API
https://www.googleapis.com/auth/gmail.readonly
https://www.googleapis.com/auth/gmail.compose
https://www.googleapis.com/auth/gmail.modify
- Google Calendar API
https://www.googleapis.com/auth/calendar.readonly
- Gmail API
The project relies on environment variables and a personalization.json
file for configuration. Ensure you have set up:
OPENAI_API_KEY
: Your personal OpenAI API keyPERSONALIZATION_FILE
: Path to your customized personalization JSON fileSCRATCH_PAD_DIR
: Directory for temporary file storage
After launching the assistant, interact using voice commands. Example interactions:
- "What do I have on my schedule for today? Tell me only most important meetings."
- "Do I have any important emails?"
- "Open ChatGPT in my browser."
- "Create a new file named user_data.txt with some example content."
- "Update the user_data.txt file by adding more information."
- "Delete the user_data.txt file."
- "Ask the research team to write a detailed market analysis report."
- "Check if the research team has completed the market analysis report."
main.py
: Application entry pointagencies/
: Agency-Swarm teams of specialized agentstools/
: Standalone tools for various functionsconfig.py
: Configuration settings and environment variable managementvisual_interface.py
: Visual interface for audio energy visualizationwebsocket_handler.py
: WebSocket event and message processing
-
Asynchronous WebSocket Communication: Utilizes
websockets
for asynchronous connection with the OpenAI Realtime API -
Audio Input/Output Handling: Manages real-time audio capture and playback with PCM16 format support and VAD (Voice Activity Detection)
-
Function Execution: Standalone tools in
tools/
are invoked by the AI assistant based on user requests -
Structured Output Processing: OpenAI's Structured Outputs are used to generate precise, structured responses
-
Visual Interface: PyGame-based interface provides real-time visualization of audio volume
Standalone tools are independent functions not associated with specific agents or agencies.
To add a new standalone tool:
- Create a new file in the
tools/
directory - Implement the
run
method using async syntax, utilizingasyncio.to_thread
for blocking operations - Install any necessary dependencies:
uv add <package_name>
Agencies are Agency-Swarm style teams of specialized agents working together on complex tasks.
To add a new agency:
- Drag-and-drop your agency folder into the
agencies/
directory - Set
async_mode="threading"
in agency configuration to enable async messaging (SendMessageAsync and GetResponse) - Install any required dependencies:
uv add <package_name>