TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Kyungsu Kim, Junghyun Koo, Sungho Lee, Haesun Joung, Kyogu Lee

Official implementation of "TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument", accepted to ICASSP 2025 (to be published).

TokenSynth is a token-based neural synthesizer that generates polyphonic single-instrument musical audio from MIDI and timbre embeddings, enabling instrument cloning, text-to-instrument synthesis, and timbre manipulation. It uses a decoder-only transformer trained on neural audio tokens with CLAP-based timbre conditioning, allowing for flexible sound design without fine-tuning.

Installation

To install TokenSynth, simply run:

pip install tokensynth

Quickstart

from tokensynth import TokenSynth, CLAP, DACDecoder
import audiofile
import torch

# Set file paths
ref_audio = "media/reference_audio.wav"
midi = "media/input_midi.mid"

# Initialize models
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
synth = TokenSynth.from_pretrained(aug=True, device=device)
clap = CLAP(device=device)
decoder = DACDecoder(device=device)

with torch.no_grad():
    # Extract timbre embeddings from audio and text
    timbre_audio = clap.encode_audio(ref_audio)
    timbre_text = clap.encode_text("warm smooth electronic bass")
    timbre_audio_text = 0.5 * timbre_audio + 0.5 * timbre_text

    # Generate audio tokens
    tokens_audio = synth.synthesize(timbre_audio, midi, top_k=10)
    tokens_text = synth.synthesize(timbre_text, midi, top_p=0.6, guidance_scale=1.6)
    tokens_audio_text = synth.synthesize(timbre_audio_text, midi, top_p=0.6, guidance_scale=1.6)

    # Decode tokens into audio waveforms
    audio_audio = decoder.decode(tokens_audio) 
    audio_text = decoder.decode(tokens_text)
    audio_audio_text = decoder.decode(tokens_audio_text)

# Save audio files
audiofile.write("media/output_audio.wav", audio_audio.cpu().numpy(), 16000)
audiofile.write("media/output_text.wav", audio_text.cpu().numpy(), 16000)
audiofile.write("media/output_audio_text.wav", audio_audio_text.cpu().numpy(), 16000)

You can also run python quickstart.py from the project root directory.

Model Weights

TokenSynth automatically downloads pretrained weights when initialized.
If you want to manually download the weights, you can find them here:

🔗 TokenSynth Pretrained Weights

Citation

@misc{kim2025tokensynthtokenbasedneuralsynthesizer,
      title={TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument}, 
      author={Kyungsu Kim and Junghyun Koo and Sungho Lee and Haesun Joung and Kyogu Lee},
      year={2025},
      eprint={2502.08939},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.08939}, 
}

LICENSE

This project is released under the MIT License

Acknowledgements:

This work utilizes codebase and pretrained weights of DAC and CLAP.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
media		media
src/tokensynth		src/tokensynth
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_pypi.md		README_pypi.md
pyproject.toml		pyproject.toml
quickstart.py		quickstart.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Official implementation of "TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument", accepted to ICASSP 2025 (to be published).

Installation

Quickstart

Model Weights

Citation

LICENSE

Acknowledgements:

About

Releases

Packages

Languages

License

KyungsuKim42/tokensynth

Folders and files

Latest commit

History

Repository files navigation

TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument

Official implementation of "TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument", accepted to ICASSP 2025 (to be published).

Installation

Quickstart

Model Weights

Citation

LICENSE

Acknowledgements:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages