A React-based multimodal live streaming library that provides a live API context, audio processing, and UI components (such as a Control Tray) for building voice-enabled applications. This package bundles both the JavaScript/TypeScript logic making it easy to integrate into your project.
- Live API Context & HookProvides an easy-to-use context (
LiveAPIProvider
) and hook (useLiveAPIContext
) that manage connections, audio streaming, and configuration for the Gemini Multimodal Live API. - Built-in Audio ProcessingUses an audio recorder and audio streamer to handle real-time audio input and output, along with volume meter worklets.
- UI ComponentsIncludes a pre-built
ControlTray
component that offers controls for connecting/disconnecting, muting/unmuting audio, and monitoring volume levels. - Tool Call Handling
A built-in tool call handler that processes function calls such as
create_todo
...and more.... - See the example below.
Install the package via npm:
npm install gemini-multimodal-live-voice-only
Wrap your application with the LiveAPIProvider
to initialize the live API context and then use the provided UI components:
Give you tools (Array of objects containing function declarations in the format of GEMINI FUNCTION CALLING TOOL)
import React from 'react';
import { LiveAPIProvider, ControlTray } from 'gemini-multimodal-live-voice-only';
// Auto-inject bundled CSS:
import 'gemini-multimodal-live-voice-only/dist/index.css';
const App = () => (
<LiveAPIProvider
apiKey="your-api-key"
dynamicConfig={{
voiceName: "Kore",
systemInstruction: {
//Give your SYSTEM_INSTRUCTION here....
parts: [{ text: "You are AI of omiii. Follow the provided tools and instructions." }]
},
// Give you tools here (Array of objects containing function declarations in the format of GEMINI FUNCTION CALLING TOOL EXAMPLE)
tools: [
{ googleSearch: {} },
{ functionDeclarations: []
]
}}
>
<ControlTray>
{/* Additional UI elements can be included here */}
</ControlTray>
</LiveAPIProvider>
);
export default App;
You can also access live API functionalities using the useLiveAPIContext
hook:
import React from 'react';
import { useLiveAPIContext } from 'gemini-multimodal-live-voice-only';
const StatusDisplay = () => {
const { connected, connect, disconnect, volume, muted, mute, unmute } = useLiveAPIContext();
return (
<div>
<h2>Status: {connected ? 'Connected' : 'Disconnected'}</h2>
<button onClick={connected ? disconnect : connect}>
{connected ? 'Disconnect' : 'Connect'}
</button>
<button onClick={muted ? unmute : mute}>
{muted ? 'Unmute' : 'Mute'}
</button>
<p>Volume: {volume}</p>
</div>
);
};
export default StatusDisplay;
Description:
The LiveAPIProvider
component initializes the live API connection and provides context to its children.
Props:
-
apiKey (string, required)Your API key for connecting to the live service.
-
dynamicConfig (object, required) Contains dynamic configuration details:
- voiceName (string): The name of the voice configuration.
- Available voices are:
- Puck
- Charon
- Kore
- Fenrir
- Aoede
- Available voices are:
- systemInstruction (object): An object with a
parts
array, where each part has atext
field. - tools (array): An array of tool configurations (e.g.,
{ googleSearch: {} }
).
- voiceName (string): The name of the voice configuration.
-
url (string, optional) The API URL. Defaults to the Gemini live API URL if not provided.
Description: A custom hook that provides access to the live API client and related state/functions.
Returns an object with:
- client: Instance of the live API client.
- config: The current live API configuration.
- setConfig: Function to update the configuration.
- connected: Boolean indicating the connection status.
- connect: Function to initiate a connection.
- disconnect: Function to disconnect.
- volume: The current audio volume.
- muted: Boolean indicating whether audio is muted.
- mute: Function to mute audio.
- unmute: Function to unmute audio.
Description: A UI component that provides audio controls, including mute/unmute and connection toggling, with visual indicators for audio volume.
This example demonstrates a simple handler that processes tool calls by switching on fc.name
, handling a "create_item"
function call, and returning appropriate responses. Feel free to modify and extend this example to fit your needs:
useEffect(() => {
const onToolCall = async (toolCall: ToolCall) => {
const responses = await Promise.all(
toolCall.functionCalls.map(async (fc) => {
// Process function call based on its name.
switch (fc.name) {
case "create_item": {
// Create an item using provided args.
try {
const response = await fetch("http://localhost:5000/items", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(fc.args),
});
const data = await response.json();
return { id: fc.id, response: { output: data } };
} catch (error) {
return { id: fc.id, response: { output: { error: error.message } } };
}
}
// Add additional cases as needed.
default:
return { id: fc.id, response: { output: { error: "Unknown function" } } };
}
})
);
setTimeout(() => client.sendToolResponse({ functionResponses: responses }), 200);
};
client.on("toolcall", onToolCall);
return () => client.off("toolcall", onToolCall);
}, [client]);
-
Clone the repository and install dependencies:
git clone https://github.com/omanandswami2005/gemini-multimodal-live-voice-only.git cd gemini-multimodal-live-voice-only npm install
-
To run TypeScript in watch mode:
npm run dev
This package uses a combination of tsc
(for TypeScript) and sass
(for SCSS) along with a post-build script to strip out individual SCSS imports. To build the package, run:
npm run build
Contributions are welcome! Please fork the repository, make your changes, and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
For any issues or questions, please open an issue in the GitHub repository or contact [email protected].
Happy coding! 🚀