This is modified research-code for
Synthesizing Obama: Learning Lip Sync from Audio.
Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman
SIGGRAPH 2017
Please see project website for the overview.
Code migrated to run on TensorFlow 1.4 and modified to add inference using sample audio file.
To train the network, simply run python2 run.py --save_dir SAVE_DIR
Where SAVE_DIR
is the directory under ./save/ which the network will be saved.
For inference, the following steps must be taken:
- Install FFMPEG. On Ubuntu it would be
sudo apt-get install ffmpeg
- Install FFMPEG-Normalize. If you have PIP, you can use
pip install ffmpeg-normalize
, else refer to the sourcecode page on Github - Put your recorded .wav file in the root of the source directory
- Run preprocess.py using
python2 preprocess.py INPUTFILE OUTPUTFILE
whereINPUTFILE
is the input .wav file andOUTPUTFILE
is the name you want for your output file. (Note: This preprocess is different than the ones used in run.py and util.py) - Copy the resulting
OUTPUTFILE.npy
to ./obama_data/audio/normalized-cep13/ - Run the inference network using
python2 run.py --save_dir SAVE_DIR --input2 OUTPUTFILE
- The result should appear in ./results/ directory.
(The above steps can be automated in a script of course.)
This repository is mostly the fork of the main repository by the paper's authors.
The MFCC code used is taken from Sphinx3 library.