Stars
Open TTS models, built for streaming on the edge
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
idiap / coqui-ai-TTS
Forked from coqui-ai/TTS🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
MARS5 speech model (TTS) from CAMB.AI
Foundational model for human-like, expressive TTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Code to apply microprosodic effects to pitch contours used for articulatory speech synthesis.
Easily train a good VC model with voice data <= 10 mins!
📋 A list of open LLMs available for commercial use.
WavJourney: Compositional Audio Creation with LLMs
Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
Unicode to ASCII transliteration - C Elixir Go Java JS Julia PHP Python Ruby Rust Shell .NET
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Qualtric or Qualtreat? Generate Qualtrics listening tests for Text-To-Speech evaluations.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
[INTERSPEECH'2022] Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Acoustic models for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion