Skip to content

Releases: SYSTRAN/faster-whisper

faster-whisper 1.0.3

01 Jul 10:05
c22db51
Compare
Choose a tag to compare

Upgrade Silero-Vad model to latest V5 version (#884)

Silero-vad V5 release: https://github.com/snakers4/silero-vad/releases/tag/v5.0

  • window_size_samples parameter is fixed at 512.
  • Change to use the state variable instead of the existing h and c variables.
  • Slightly changed internal logic, now some context (part of previous chunk) is passed along with the current chunk.
  • Change the dimensions of the state variable from 64 to 128.
  • Replace ONNX file with V5 version

Other changes

  • Improve language detection when using clip_timestamps (#867)
  • Docker file improvements (#848)
  • Fix #839 incorrect clip_timestamps being used in model (#842)

faster-whisper 1.0.2

06 May 02:08
2f6913e
Compare
Choose a tag to compare
  • Add support for distil-large-v3 (#755)
    The latest Distil-Whisper model, distil-large-v3, is intrinsically designed to work with the OpenAI sequential algorithm.

  • Benchmarks (#773)
    Introduces functionality to measure benchmarking for memory, Word Error Rate (WER), and speed in Faster-whisper.

  • Support initializing more whisper model args (#807)

  • Small bug fix:

    • code breaks if audio is empty (#768)
    • Foolproof: Disable VAD if clip_timestamps is in use (#769)
    • make faster_whisper.assets as a valid python package to distribute (#774)
    • Loosen tokenizers version constraint (#804)
    • CUDA version and updated installation instructions (#785)
  • New feature from original openai Whisper project:

    • Feature/add hotwords (#731)
    • Improve language detection (#732)

faster-whisper 1.0.1

01 Mar 10:46
a342b02
Compare
Choose a tag to compare
  • Bug fixes and performance improvements:
    • Update logic to get segment from features before encoding (#705)
    • Fix window end heuristic for hallucination_silence_threshold (#706)

faster-whisper 1.0.0

22 Feb 08:56
06d32bf
Compare
Choose a tag to compare
  • Support distil-whisper model (#557)
    Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling.
    For more detail: https://github.com/huggingface/distil-whisper

  • Upgrade ctranslate2 version to 4.0 to support CUDA 12 (#694)

  • Upgrade PyAV version to 11.* to support Python3.12.x (#679)

  • Small bug fixes

    • Illogical "Avoid computing higher temperatures on no_speech" (#652)
    • broken prompt_reset_on_temperature (#604)
    • Word timing tweaks (#616)
  • New improvements from original OpenAI Whisper project

    • Skip silence around hallucinations (#646)
    • Prevent infinite loop for out-of-bound timestamps in clip_timestamps (#697)

faster-whisper 0.10.1

22 Feb 12:08
Compare
Choose a tag to compare

Fix the broken tag v0.10.0

faster-whisper 0.10.0

22 Feb 11:55
Compare
Choose a tag to compare
  • Support "large-v3" model with
    • The ability to load feature_size/num_mels and other from preprocessor_config.json
    • A new language token for Cantonese (yue)
  • Update CTranslate2 requirement to include the latest version 3.22.0
  • Update tokenizers requirement to include the latest version 0.15
  • Change the hub to fetch models from Systran organization

faster-whisper 0.9.0

18 Sep 14:34
Compare
Choose a tag to compare
  • Add function faster_whisper.available_models() to list the available model sizes
  • Add model property supported_languages to list the languages accepted by the model
  • Improve error message for invalid task and language parameters
  • Update tokenizers requirement to include the latest version 0.14

faster-whisper 0.8.0

04 Sep 10:01
Compare
Choose a tag to compare

Expose new transcription options

Some generation parameters that were available in the CTranslate2 API but not exposed in faster-whisper:

  • repetition_penalty to penalize the score of previously generated tokens (set > 1 to penalize)
  • no_repeat_ngram_size to prevent repetitions of ngrams with this size

Some values that were previously hardcoded in the transcription method:

  • prompt_reset_on_temperature to configure after which temperature fallback step the prompt with the previous text should be reset (default value is 0.5)

Other changes

  • Fix a possible memory leak when decoding audio with PyAV by forcing the garbage collector to run
  • Add property duration_after_vad in the returned TranscriptionInfo object
  • Add "large" alias for the "large-v2" model
  • Log a warning when the model is English-only but the language parameter is set to something else

faster-whisper 0.7.1

24 Jul 09:20
Compare
Choose a tag to compare
  • Fix a bug related to no_speech_threshold: when the threshold was met for a segment, the next 30-second window reused the same encoder output and was also considered as non speech
  • Improve selection of the final result when all temperature fallbacks failed by returning the result with the best log probability

faster-whisper 0.7.0

18 Jul 13:30
Compare
Choose a tag to compare

Improve word-level timestamps heuristics

Some recent improvements from openai-whisper are ported to faster-whisper:

Support download of user converted models from the Hugging Face Hub

The WhisperModel constructor now accepts any repository ID as argument, for example:

model = WhisperModel("username/whisper-large-v2-ct2")

The utility function download_model has been updated similarly.

Other changes

  • Accept an iterable of token IDs for the argument initial_prompt (useful to include timestamp tokens in the prompt)
  • Avoid computing higher temperatures when no_speech_threshold is met (same as openai/whisper@e334ff1)
  • Fix truncated output when using a prefix without disabling timestamps
  • Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes