Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MP3 support to AudioInterface and update tests #1222

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bendichter
Copy link
Contributor

fix #1219

@h-mayorquin
Copy link
Collaborator

I hadn’t seen this before opening the other ones—my mistake.

I’ve been thinking about this, and the main challenge is that many of these libraries rely on FFmpeg for reading audio files. However, packaging FFmpeg in a way that works reliably across the three major operating systems within a pip-installable framework is difficult.

To test this, I’ve created a small set of stub files for the most common archival formats (WAV, FLAC, AIFF), along with MP3 and OGG, which are also widely used. We can them put them in gin. I believe this is necessary to establish a testing framework where these formats can be accessed without requiring FFmpeg or any other dependency that isn’t pip-installable. This will also allow us to enable CI testing in an environment similar to what our users experience.

A preliminary review suggests that torchaudio could be a straightforward solution, even though it’s quite heavy. Once these test files are available on Gin, we can explore lighter alternatives if we want since PyTorch itself is very large (5GiB as a dependency as they vendorize things inside of the package).

That said, I might be overlooking a better approach. What do you think, Ben?

@bendichter
Copy link
Contributor Author

I see. I'm not crazy about a 5GB dependency just to read MP3s. If we use ffmpeg, the downsides are that

  1. Users will need to install ffmpeg if they have not already
  2. This will require us to either not include this capability in GUIDE or to put some work into including ffmpeg, which will be a bit tricky because it is OS-dependent.

This may have been why we previously stopped at wav files. I think I would prefer going with librosa even given those down-sides though I agree it's not ideal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Format]: support for mp3 audio
2 participants