Converse with large language models using speech.
- Open: Powered by state-of-the-art open-source speech processing models.
- Efficient: Light enough to run on consumer hardware, with low latency.
- Self-hosted: Entire pipeline runs offline, limited only by compute power.
- Modular: Switching LLM providers is as simple as changing an environment variable.
![Sage architecture](https://github.com/farshed/sage/raw/main/assets/architecture-dark.png?raw=true)
Run bun docker-build
to build the image and then bun docker-run
to spin a container. The UI is exposed at http://localhost:3000
.
Note: Using docker results in significantly slower inference (5-8x slower than native).
-
Run
setup-unix.sh
orsetup-win.bat
depending on your platform. This will download the required model weights and compile the binaries needed for Sage. -
For text generation, you can either self-host an LLM using Ollama, or opt for a third-party provider.
-
If you're using Ollama, add the
OLLAMA_MODEL
variable to the .env file to specify the model you'd like to use. (Example:OLLAMA_MODEL=deepseek-r1:7b
) -
Among the third-party providers, Sage supports the following out of the box:
- Deepseek
- OpenAI
- Anthropic
- Together.ai
-
To use a provider, add a
<PROVIDER>_API_KEY
variable to the .env file. (Example:OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxx
) -
To choose which model should be used for a given provider, use the
<PROVIDER>_MODEL
variable. (Example:DEEPSEEK_MODEL=deepseek-chat
)
- Start the project with
bun start
. The first run on macOS is slow (~20 minutes on M1 Pro), since the ANE service compiles the Whisper CoreML model to a device-specific format. Next runs are faster.
- Make it easier to run. (Dockerize?)
- Optimize the pipeline
- Release as a library?