Speaker2Dubber

[ACM MM24] Official implementation of paper "From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning"

🗒 TODOs

Release Speaker2Dubber‘s demo at here.
Release the generated test set at Google Drive or Baidu Cloud Drive (Password: mm24).
Release Speaker2Dubber's train and inference code (Tips: There maybe still some bugs in the code, feel free to use after the checkpoint released.).
Release Speaker2Dubber's model.
Update README.md (How to use).
Release the first-stage and second-stage pre-trained checkpoints.

🌼 Environment

Our python version is 3.8.18 and cuda version 11.5. It's possible to have another compatible version. Both training and inference are implemented with PyTorch on a GeForce RTX 4090 GPU.

conda create -n speaker2dubber python=3.8.18
conda activate speaker2dubber
pip install -r requirements.txt
pip install git+https://github.com/resemble-ai/monotonic_align.git

🔧 Training

You need repalce tha path in preprocess_config to your preprocssed data path (see "config/MovieAnimation/preprocess.yaml") to you own path and run:

python train.py -p config/MovieAnimation/preprocess.yaml -m config/MovieAnimation/model.yaml -t config/MovieAnimation/train.yaml

✍ Inference

There is three setting in V2C task.

python Synthesis.py --restore_step 50000 -s 1 -n 'YOUR_EXP_NAME'

python Synthesis.py --restore_step 50000 -s 2 -n 'YOUR_EXP_NAME'

python Synthesis.py --restore_step 50000 -s 3 -n 'YOUR_EXP_NAME'

The s denotes the inference setting (1 for setting1 which use gt audio as reference audio, 2 for setting2 which use another audio from target speaker as reference audio, 3 for zero shot setting which use reference audio from unseen dataset as refernce audio.)

📊 Dataset

GRID (BaiduDrive (code: GRID) / GoogleDrive)
V2C-Animation dataset (chenqi-Denoise2)

🙏 Acknowledgments

We would like to thank the authors of previous related projects for generously sharing their code and insights: HPMDubbing, Monotonic Align, StyleSpeech, FastSpeech2, V2C, StyleDubber, PL-BERT, and HiFi-GAN.

🤝 Ciation

If you find our work useful, please consider citing:

@inproceedings{zhang-etal-2024-speaker2dubber,
  author       = {Zhedong Zhang and
                  Liang Li and
                  Gaoxiang Cong and
                  Haibing Yin and
                  Yuhan Gao and
                  Chenggang Yan and
                  Anton van den Hengel and
                  Yuankai Qi},
  title        = {From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency
                  Learning},
  booktitle    = {Proceedings of the 32nd {ACM} International Conference on Multimedia,
                  {MM} 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November
                  2024},
  pages        = {7523--7532},
  publisher    = {{ACM}},
  year         = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
EmoFAN/emonet		EmoFAN/emonet
Figs		Figs
LipEncoder/Lipreading_using_Temporal_Convolutional_Networks		LipEncoder/Lipreading_using_Temporal_Convolutional_Networks
audio		audio
config		config
emotion_encoder		emotion_encoder
lexicon		lexicon
model		model
preprocessor		preprocessor
speaker_encoder		speaker_encoder
text		text
transformer		transformer
utils		utils
vocoder		vocoder
.gitignore		.gitignore
Inference.py		Inference.py
LICENSE		LICENSE
README.md		README.md
Synthesis.py		Synthesis.py
dataset.py		dataset.py
evaluate.py		evaluate.py
mcd.py		mcd.py
prepare_align.py		prepare_align.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker2Dubber

🗒 TODOs

🌼 Environment

🔧 Training

✍ Inference

📊 Dataset

🙏 Acknowledgments

🤝 Ciation

About

Releases

Packages

Languages

License

ZZDoog/Speaker2Dubber

Folders and files

Latest commit

History

Repository files navigation

Speaker2Dubber

🗒 TODOs

🌼 Environment

🔧 Training

✍ Inference

📊 Dataset

🙏 Acknowledgments

🤝 Ciation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages