Releases: echogarden-project/echogarden
Releases · echogarden-project/echogarden
v2.3.3
Fixes
- Kokoro (synthesis): apply English phoneme substitutions to more closely follow the Misaki phonemizer output. Pronunciation of words containing diphthongs like
ɔɪ
, such as "noise" and "annoy", used to be incorrectly pronounced by the US English voices withʌ
-like sounds, like "naise" and "annay". The reason was that the special diphthong token (Y
in this case) that the model was trained on, wasn't mapped correctly (this problem also occurs in many JavaScript ports of Kokoro, including the port made by its original author). The issue should now be fixed, and pronunciation should be more accurate and consistent in general.
Enhancements
- Synthesis: added many more additional words (incorrectly pronounced by eSpeak-NG), to the English correction lexicon
Full Changelog: v2.3.2...v2.3.3
v2.3.2
Fixes
- Work around eSpeak-NG marker issue when multiple square brackets are included in a fragment
Full Changelog: v2.3.1...v2.3.2
v2.3.1
Fixes
- Fix unreported issue of files with empty content failing to write to disk (how wasn't this issue reported?)
Full Changelog: v2.3.0...v2.3.1
v2.3.0
Features
- Add Deepgram cloud TTS engine
Enhancements
- Deepgram STT: convert to Opus codec (48 kbit/s) when sending audio to server. Add option for adding punctuation (enabled by default)
- Updated and improved ElevenLabs TTS engine. Add options for selecting model (documented here) and optional seed.
Fixes
- Fix ElevenLabs options casing to match the documentation
Full Changelog: v2.2.0...v2.3.0
v2.2.1
Fixes
- Synthesis: fix regression caused by converting
:
to,
. eSpeak-NG gets crazy and skips markers when a fragment like::
is converted to,,
. Instead covert to only a single,
regardless of the:
count
Full Changelog: v2.2.0...v2.2.1
v2.2.0
Features
- Add support for Deepgram STT by @DoneMaster in #95
Enhancements
- Synthesis: updated pronunciation lexicons
- Synthesis: rewrite IPA to Kirshenbaum table
- eSpeak-NG synthesis and phonemization: prevent pronunciation of angle brackets and colons in parts
Fixes
- Fix references to
ElevenLabs.ts
- Add missing format properties to FFMpeg codec parameters
New contributors
- @DoneMaster made their first contribution in #95
Full Changelog: v2.1.2...v2.2.0
v2.1.2
Enhancements
- Expanded heteronym and word lexicons
Fixes
- Fix incorrect logical operator, that caused phoneme timelines of words extracted from eSpeak event output, to be incorrectly merged in some cases
Changes in the lexicon format
- Change naming of lexicon properties from
succeededBy
andnotSucceededBy
tofollowedBy
andnotFollowedBy
(the deprecated property names are still read in code, as a fallback, to ensure backward compatibility)
Documentation
- Several changes and fixes to the documentation
- Added a new guide for enabling the
cuda
ONNX execution provider in Linux and Windows Subsystem for Linux (WSL)
Full Changelog: v2.1.1...v2.1.2
v2.1.1
Enhancements
- Added a new default pronunciation lexicon for English (located at
data/lexicons/words.en.json
), containing corrections for words mispronounced or inaccurately pronounced by eSpeak-NG. For examplevs.
will now be pronounced as "versus" rather than "vee ess". Also, now with the higher-quality Kokoro voices, these subtle corrections would become more important, since the Kokoro model is generally more loyal to the exact IPA specified, so it's able to provide better accuracy in general - Some updates to the Heteronym lexicon, including corrections to the disambiguation logic for the word "learned" (deciding between verb
l ˈɜː ɹ n d
and adjectivel ˈɜː ɹ n ɪ d
)
Full Changelog: v2.1.0...v2.1.1
v2.1.0
Features
- Added the Kokoro TTS engine: new and high-quality open-source local synthesis model based on StyleTTS 2. All currently available voices and languages are supported (English US and UK, Spanish, French, Hindi, Italian, Brazilian Portuguese and Chinese), except for Japanese (due to limitations of eSpeak-NG phonemization for Japanese)
- Added the Gnuspeech TTS engine (WebAssembly): legacy English-only speech synthesizer based on articulatory synthesis techniques (initially released in 2002)
Fixes
- Clarified log message for sentences to use "part" time instead of "segment" time
Enhancements
- Added newer OpenAI synthesis voices "Ash", "Coral" and "Sage"
- Some small additions to the English heteronym lexicon
Pull requests merged
- Synthesis.ts: fixed typo in symbol name by @Boorj in #91
- Development.md: fixed a typo in a link by @kbulygin in #88
- Options.md: Align, Whisper: added a note about
timestampAccuracy
by @kbulygin in #89
New Contributors
Full Changelog: v2.0.14...v2.1.0