Skip to content

farshed/sage

Repository files navigation

Sage

Converse with large language models using speech.

  • Open: Powered by state-of-the-art open-source speech processing models.
  • Efficient: Light enough to run on consumer hardware, with low latency.
  • Self-hosted: Entire pipeline runs offline, limited only by compute power.
  • Modular: Switching LLM providers is as simple as changing an environment variable.

How it works


Sage architecture

Docker

Run bun docker-build to build the image and then bun docker-run to spin a container. The UI is exposed at http://localhost:3000.

Note: Using docker results in significantly slower inference (5-8x slower than native).

Manual Setup (Without Docker)

Requirements

  • Bun
  • Rust
  • Ollama (Alternatively, you can use a third-party provider)

Run

  1. Run setup-unix.sh or setup-win.bat depending on your platform. This will download the required model weights and compile the binaries needed for Sage.

  2. For text generation, you can either self-host an LLM using Ollama, or opt for a third-party provider.

  • If you're using Ollama, add the OLLAMA_MODEL variable to the .env file to specify the model you'd like to use. (Example: OLLAMA_MODEL=deepseek-r1:7b)

  • Among the third-party providers, Sage supports the following out of the box:

    1. Deepseek
    2. OpenAI
    3. Anthropic
    4. Together.ai
  • To use a provider, add a <PROVIDER>_API_KEY variable to the .env file. (Example: OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxx)

  • To choose which model should be used for a given provider, use the <PROVIDER>_MODEL variable. (Example: DEEPSEEK_MODEL=deepseek-chat)

  1. Start the project with bun start. The first run on macOS is slow (~20 minutes on M1 Pro), since the ANE service compiles the Whisper CoreML model to a device-specific format. Next runs are faster.

Future work

  • Make it easier to run. (Dockerize?)
  • Optimize the pipeline
  • Release as a library?