Skip to content

Releases: echogarden-project/echogarden

v2.3.3

03 Mar 06:49
Compare
Choose a tag to compare

Fixes

  • Kokoro (synthesis): apply English phoneme substitutions to more closely follow the Misaki phonemizer output. Pronunciation of words containing diphthongs like ɔɪ, such as "noise" and "annoy", used to be incorrectly pronounced by the US English voices with ʌ-like sounds, like "naise" and "annay". The reason was that the special diphthong token (Y in this case) that the model was trained on, wasn't mapped correctly (this problem also occurs in many JavaScript ports of Kokoro, including the port made by its original author). The issue should now be fixed, and pronunciation should be more accurate and consistent in general.

Enhancements

  • Synthesis: added many more additional words (incorrectly pronounced by eSpeak-NG), to the English correction lexicon

Full Changelog: v2.3.2...v2.3.3

v2.3.2

27 Feb 05:22
Compare
Choose a tag to compare

Fixes

  • Work around eSpeak-NG marker issue when multiple square brackets are included in a fragment

Full Changelog: v2.3.1...v2.3.2

v2.3.1

25 Feb 11:39
Compare
Choose a tag to compare

Fixes

  • Fix unreported issue of files with empty content failing to write to disk (how wasn't this issue reported?)

Full Changelog: v2.3.0...v2.3.1

v2.3.0

25 Feb 10:37
Compare
Choose a tag to compare

Features

  • Add Deepgram cloud TTS engine

Enhancements

  • Deepgram STT: convert to Opus codec (48 kbit/s) when sending audio to server. Add option for adding punctuation (enabled by default)
  • Updated and improved ElevenLabs TTS engine. Add options for selecting model (documented here) and optional seed.

Fixes

  • Fix ElevenLabs options casing to match the documentation

Full Changelog: v2.2.0...v2.3.0

v2.2.1

23 Feb 10:10
Compare
Choose a tag to compare

Fixes

  • Synthesis: fix regression caused by converting : to ,. eSpeak-NG gets crazy and skips markers when a fragment like :: is converted to ,,. Instead covert to only a single , regardless of the : count

Full Changelog: v2.2.0...v2.2.1

v2.2.0

22 Feb 20:37
Compare
Choose a tag to compare

Features

Enhancements

  • Synthesis: updated pronunciation lexicons
  • Synthesis: rewrite IPA to Kirshenbaum table
  • eSpeak-NG synthesis and phonemization: prevent pronunciation of angle brackets and colons in parts

Fixes

  • Fix references to ElevenLabs.ts
  • Add missing format properties to FFMpeg codec parameters

New contributors

Full Changelog: v2.1.2...v2.2.0

v2.1.2

14 Feb 18:24
Compare
Choose a tag to compare

Enhancements

  • Expanded heteronym and word lexicons

Fixes

  • Fix incorrect logical operator, that caused phoneme timelines of words extracted from eSpeak event output, to be incorrectly merged in some cases

Changes in the lexicon format

  • Change naming of lexicon properties from succeededBy and notSucceededBy to followedBy and notFollowedBy (the deprecated property names are still read in code, as a fallback, to ensure backward compatibility)

Documentation

  • Several changes and fixes to the documentation
  • Added a new guide for enabling the cuda ONNX execution provider in Linux and Windows Subsystem for Linux (WSL)

Full Changelog: v2.1.1...v2.1.2

v2.1.1

13 Feb 10:55
Compare
Choose a tag to compare

Enhancements

  • Added a new default pronunciation lexicon for English (located at data/lexicons/words.en.json), containing corrections for words mispronounced or inaccurately pronounced by eSpeak-NG. For example vs. will now be pronounced as "versus" rather than "vee ess". Also, now with the higher-quality Kokoro voices, these subtle corrections would become more important, since the Kokoro model is generally more loyal to the exact IPA specified, so it's able to provide better accuracy in general
  • Some updates to the Heteronym lexicon, including corrections to the disambiguation logic for the word "learned" (deciding between verb l ˈɜː ɹ n d and adjective l ˈɜː ɹ n ɪ d)

Full Changelog: v2.1.0...v2.1.1

v2.1.0

12 Feb 16:30
Compare
Choose a tag to compare

Features

  • Added the Kokoro TTS engine: new and high-quality open-source local synthesis model based on StyleTTS 2. All currently available voices and languages are supported (English US and UK, Spanish, French, Hindi, Italian, Brazilian Portuguese and Chinese), except for Japanese (due to limitations of eSpeak-NG phonemization for Japanese)
  • Added the Gnuspeech TTS engine (WebAssembly): legacy English-only speech synthesizer based on articulatory synthesis techniques (initially released in 2002)

Fixes

  • Clarified log message for sentences to use "part" time instead of "segment" time

Enhancements

  • Added newer OpenAI synthesis voices "Ash", "Coral" and "Sage"
  • Some small additions to the English heteronym lexicon

Pull requests merged

  • Synthesis.ts: fixed typo in symbol name by @Boorj in #91
  • Development.md: fixed a typo in a link by @kbulygin in #88
  • Options.md: Align, Whisper: added a note about timestampAccuracy by @kbulygin in #89

New Contributors

Full Changelog: v2.0.14...v2.1.0

v2.0.14

24 Dec 16:46
Compare
Choose a tag to compare

Fixes

  • Whisper tokenizer: attempt to workaround #85 by accepting a token that's one beyond the valid range (51865 for a multilingual model, 51864 for an English-only model).

Full Changelog: v2.0.13...v2.0.14