Whisper-Finetune

This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:

Timestamp training
Prompt training
Stochastic depth implementation for improved model generalization
Correct implementation of SpecAugment for robust audio data augmentation
Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
Integration with Weights & Biases (wandb) for experiment tracking and model versioning

Installation

Clone the repository:

git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune

Create and activate a virtual environment (strongly recommended) with Python 3.9.* and a Rust compiler available.
Install the package in editable mode:
```
pip install -e .
```

Data

Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the script.

Usage

Create a configuration file (see examples in configs/*.yaml)

Run the fine-tuning script:

python src/whisper_finetune/scripts/finetune.py --config configs/large-cv-srg-sg-corpus.yaml

Deployment

We suggest to use faster-whisper. To convert your fine-tuned model, you can use the script located at src/whisper_finetune/scripts/convert_c2t.py.

Further improvement of quality can be archieved by serving the requests with whisperx.

Configuration

Modify the YAML files in the configs/ directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.

Thank you

The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Support

If you encounter any problems, please file an issue along with a detailed description.

Maintainer

Vincenzo Timmel ([email protected])

Developers

Vincenzo Timmel ([email protected])
Claudio Paonessa ([email protected])

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
cache		cache
configs		configs
src/whisper_finetune		src/whisper_finetune
.env-template		.env-template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_spec_tw_eda.ipynb		_spec_tw_eda.ipynb
download_dataset.bash		download_dataset.bash
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sc_debug.sh		sc_debug.sh
sc_sbatch.sh		sc_sbatch.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper-Finetune

Installation

Data

Usage

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License

About

Releases

Packages

Contributors 2

Languages

License

i4Ds/whisper-finetune

Folders and files

Latest commit

History

Repository files navigation

Whisper-Finetune

Installation

Data

Usage

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages