Realtime API Async Python Assistant

This project demonstrates the use of OpenAI's Realtime API to create an AI assistant capable of handling voice input, performing various tasks, and providing audio responses. It showcases the integration of tools, structured output responses, and real-time interaction.

Features

Core Functionality

Real-time voice interaction with an AI assistant
Asynchronous audio input and output handling
Custom tools execution based on user requests

Task Delegation & Communication

Synchronous Communication: Direct, immediate interaction with agents for quick tasks
Asynchronous Task Delegation: Long-running task delegation to agencies/agents
- Send messages to agency CEOs without waiting for responses
- Send messages to subordinate agents on behalf of CEOs
Task Status Monitoring: Check completion status and retrieve responses
Multiple specialized AI agent teams working collaboratively

Integration Services

Google Calendar integration for meeting schedule management
Gmail integration for email handling and drafting
Browser interaction for web-related tasks
File system operations (create, update, delete)

Available Tools

Agency Communication Tools

SendMessage: Synchronous communication with agencies/agents for quick tasks
- Direct interaction with immediate response
- Suitable for simple, fast-completing tasks
SendMessageAsync: Asynchronous task delegation
- Initiates long-running tasks without waiting
- Returns immediately to allow other operations
GetResponse: Task status and response retrieval
- Checks completion status of async tasks
- Retrieves agent responses when tasks complete

Google Workspace Integration

FetchDailyMeetingSchedule: Fetches and formats the user's daily meeting schedule from Google Calendar
GetGmailSummary: Provides a concise summary of unread Gmail messages from the past 48 hours
DraftGmail: Composes email drafts, either as a reply to an email from GetGmailSummary, or as a new message

System Tools

GetScreenDescription: Captures and analyzes the current screen content for the assistant
FileOps:
- CreateFile: Generates new files with user-specified content
- UpdateFile: Modifies existing files with new content
- DeleteFile: Removes specified files from the system
OpenBrowser: Launches a web browser with a given URL
GetCurrentDateTime: Retrieves and reports the current date and time

Setup

MacOS Installation

Install Python 3.12.
Install uv, a modern Python package manager
Clone this repository to your local machine
Create a local environment file .env based on .env.sample
Customize personalization.json and config.py to your preferences
Install the required audio library: brew install portaudio
Install project dependencies: uv sync
Launch the assistant: uv run main

Google Cloud API Configuration

To enable Google Cloud API integration, follow these steps:

Create OAuth 2.0 Client IDs in the Google Cloud Console
Place the credentials.json file in the project's root directory
Configure http://localhost:8080/ as an Authorized Redirect URI in your Google Cloud project settings
Set the OAuth consent screen to "Internal" user type
Enable the following APIs and scopes in your Google Cloud project:
- Gmail API
  - https://www.googleapis.com/auth/gmail.readonly
  - https://www.googleapis.com/auth/gmail.compose
  - https://www.googleapis.com/auth/gmail.modify
- Google Calendar API
  - https://www.googleapis.com/auth/calendar.readonly

Configuration

The project relies on environment variables and a personalization.json file for configuration. Ensure you have set up:

OPENAI_API_KEY: Your personal OpenAI API key
PERSONALIZATION_FILE: Path to your customized personalization JSON file
SCRATCH_PAD_DIR: Directory for temporary file storage

Usage

After launching the assistant, interact using voice commands. Example interactions:

"What do I have on my schedule for today? Tell me only most important meetings."
"Do I have any important emails?"
"Open ChatGPT in my browser."
"Create a new file named user_data.txt with some example content."
"Update the user_data.txt file by adding more information."
"Delete the user_data.txt file."
"Ask the research team to write a detailed market analysis report."
"Check if the research team has completed the market analysis report."

Code Structure

Core Components

main.py: Application entry point
agencies/: Agency-Swarm teams of specialized agents
tools/: Standalone tools for various functions
config.py: Configuration settings and environment variable management
visual_interface.py: Visual interface for audio energy visualization
websocket_handler.py: WebSocket event and message processing

Key Features

Asynchronous WebSocket Communication: Utilizes websockets for asynchronous connection with the OpenAI Realtime API
Audio Input/Output Handling: Manages real-time audio capture and playback with PCM16 format support and VAD (Voice Activity Detection)
Function Execution: Standalone tools in tools/ are invoked by the AI assistant based on user requests
Structured Output Processing: OpenAI's Structured Outputs are used to generate precise, structured responses
Visual Interface: PyGame-based interface provides real-time visualization of audio volume

Extending Functionality

Adding Standalone Tools

Standalone tools are independent functions not associated with specific agents or agencies.

To add a new standalone tool:

Create a new file in the tools/ directory
Implement the run method using async syntax, utilizing asyncio.to_thread for blocking operations
Install any necessary dependencies: uv add <package_name>

Adding New Agencies

Agencies are Agency-Swarm style teams of specialized agents working together on complex tasks.

To add a new agency:

Drag-and-drop your agency folder into the agencies/ directory
Set async_mode="threading" in agency configuration to enable async messaging (SendMessageAsync and GetResponse)
Install any required dependencies: uv add <package_name>

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
AI_DOCS		AI_DOCS
src/voice_assistant		src/voice_assistant
.cursorrules		.cursorrules
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
personalization.json		personalization.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime API Async Python Assistant

Features

Core Functionality

Task Delegation & Communication

Integration Services

Available Tools

Agency Communication Tools

Google Workspace Integration

System Tools

Setup

MacOS Installation

Google Cloud API Configuration

Configuration

Usage

Code Structure

Core Components

Key Features

Extending Functionality

Adding Standalone Tools

Adding New Agencies

Additional Resources

About

Releases

Packages

Contributors 2

Languages

VRSEN/agency-voice-interface

Folders and files

Latest commit

History

Repository files navigation

Realtime API Async Python Assistant

Features

Core Functionality

Task Delegation & Communication

Integration Services

Available Tools

Agency Communication Tools

Google Workspace Integration

System Tools

Setup

MacOS Installation

Google Cloud API Configuration

Configuration

Usage

Code Structure

Core Components

Key Features

Extending Functionality

Adding Standalone Tools

Adding New Agencies

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages