proposal: generative AI instrument sample generation #4322

haroon10725 · 2025-01-30T15:53:25Z

@walterbender @pikurasa Can you please review this.

pikurasa · 2025-01-30T16:18:36Z

This is not quite ready as a "PR".

Some quick feedback:

Keyboard shortcuts should be disabled when this is open.
We should add a few pre-made prompts to help users understand the idea of how to use this**
We'll need to implement an API to a backend that does the work.

**perhaps we use a random words generator that strings together the following: instrument-adjective + instrument-noun + "with" + additional-instrument adjective. For example, it may generate the following sentence: "cedar top + acoustic guitar + with + a buzzy fret sound". This is somewhat inspired by the way Jitsi prompts room names for users. See https://meet.jit.si/

haroon10725 · 2025-01-30T16:22:24Z

@pikurasa Let's keep this as a draft PR. Thankyou for your feedback

haroon10725 · 2025-01-31T14:47:30Z

@walterbender @pikurasa I think we should maintain consistency in the buttons and input fields used for widgets. There is an AI widget with a similar feature that takes user input and provides output. While the functionality is different, similar elements should have the same height, width, margin, etc. I think it will give the user a good user experience.
What are your opinions on it.

walterbender · 2025-01-31T14:53:39Z

I very much like the idea of this enhancement. But as Devin pointed out, we need to get the AI side working (and explore it some) before we settle in on the UI/UX details.

haroon10725 · 2025-02-01T12:11:05Z

Before actually starting the coding part, I think designing the architecture (how it is going to work) is important.

This is the design I came up with: I added an extra LLM layer between the user input and the Music LLM because the Music LLM requires a detailed prompt describing the sound font to generate high-quality and accurate results.

I believe students may struggle to write such a detailed prompt describing the sound font they have in mind. They might only provide a brief description, which may not accurately capture the sound font they envision.

@walterbender @pikurasa What do you think about it?

walterbender · 2025-02-01T14:55:00Z

Probably this layered approach will be necessary.

haroon10725 · 2025-02-03T11:42:20Z

I researched open-source models for generating sound fonts and came across https://audioldm.github.io/. I tried it, and the results were good. The model requires a prompt to generate the sound, and the better the prompt, the better the results.
The following sample is generated from it.

The prompt was "A smooth, warm clarinet with a clear, sharp attack, transitioning into a mellow sustain, offering a soothing, rich tone with natural woodiness and subtle vibrato"

techno.mp4

@walterbender @pikurasa What are your opinions on it?

walterbender · 2025-02-03T12:44:21Z

Seems like it has real promise.
It would be interesting to explore note duration as well.

haroon10725 · 2025-02-04T09:54:13Z

Yes, I will be exploring it also.

haroon10725 · 2025-02-04T11:33:11Z

@walterbender There is a audio_length_in_s argument, I think we can use it for note duration
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=1.0).audios[0]

In the layered approach, we can extract the note duration and the description of the sound font. The note duration can be converted into seconds, while the description can be used as a prompt for a Music LLM, with the converted duration placed accordingly.

techno.mp4

haroon10725 · 2025-02-05T13:06:10Z

@pikurasa What are your opinions on it?

pikurasa · 2025-02-05T17:59:27Z

Yes, this is going in a good direction. Thanks for the research @haroon10725

haroon10725 · 2025-02-09T14:11:57Z

@walterbender @pikurasa Should I try to find some more opensource models? Or is this fine.

therealharshit · 2025-02-09T17:11:32Z

@haroon10725 can you please explain how you tested this model, as I was also looking for some open source model for sample generator.

pikurasa · 2025-02-10T20:09:40Z

@walterbender @pikurasa Should I try to find some more opensource models? Or is this fine.

This model is probably fine, but it's nice to know what other models are available (if any).

haroon10725 · 2025-02-11T02:51:10Z

@haroon10725 can you please explain how you tested this model, as I was also looking for some open source model for sample generator.

@therealharshit I tested this model on my computer.

haroon10725 · 2025-02-11T02:53:42Z

@walterbender @pikurasa Should I try to find some more opensource models? Or is this fine.

This model is probably fine, but it's nice to know what other models are available (if any).

@pikurasa Thankyou for your feedback. I have found some other opensource models, will share the results soon.

haroon10725 · 2025-02-11T12:10:15Z

I researched about some more open-source models. I tried those, and the results were good then the previous one. The pro's of this model was that it generated a good sound font without a detailed prompt. But the con's was that it was a heavy model and took some time to generate the sound. (As the model will be deployed so I think it won't be the issue). The results are as follow.

The prompt was "something between a clarinet and a human singing 'ah'"
https://github.com/user-attachments/assets/e32202e5-3bec-4a04-84ed-5a40c7d1426c

The prompt was "something between a heavy metal guitar and a lion roar"
https://github.com/user-attachments/assets/f14ea4b8-6953-4f85-ad74-5739c923a5be

(Note: The audio converter added extra seconds while converting from .wav to .mp4. Please listen to the first 5 seconds only)

The good part is that we have an option.
@walterbender @pikurasa What do you think about it?

pikurasa · 2025-02-11T16:26:02Z

The good part is that we have an option.

Yes, that's great.

@walterbender @pikurasa What do you think about it?

It's interesting.

Certainly, it's good that we are also working on how to process a sample for sound fonts (i.e. virtual instruments) over the summer as it seems that all these generated sounds may need some extra processing before they can be useful for our needs.

haroon10725 · 2025-02-21T18:09:47Z

I researched about some more open-source models. I tried those, and the results were good then the previous one. The pro's of this model was that it generated a good sound font without a detailed prompt. But the con's was that it was a heavy model and took some time to generate the sound. (As the model will be deployed so I think it won't be the issue). The results are as follow.

https://huggingface.co/spaces/facebook/MusicGen
@pikurasa @walterbender You can try this opensource model, it is hosted (no need to download anything). It has all the functionality as discussed in last meet. Do give it a try and let me know your opinion.

haroon10725 · 2025-02-23T17:28:26Z

@pikurasa @walterbender I think the server is busy. You can share some prompts or audio files, I can try those in my computer. Also will keep an eye whether the server is up or not, so you can also try.

haroon10725 · 2025-02-24T18:03:44Z

I found this interesting website MusicGen by Facebook . It has some description about the model and some sounds samples generated from this model.

The good part is it generates high quality samples and is opensource. I was thinking that we can use this model for sample generation. So far this model looks good to me as compared to previous one.

@walterbender @pikurasa What do you think about it?

walterbender · 2025-02-24T19:48:06Z

Worth exploring. MIT License, which is good.

add user prompt in sampler widget

3cf69fb

pikurasa marked this pull request as draft January 30, 2025 16:05

haroon10725 closed this Jan 30, 2025

haroon10725 reopened this Jan 31, 2025

haroon10725 changed the title ~~feat: add user prompt in sampler widget~~ feat: generative AI instrument sample generation Jan 31, 2025

disable keyboard shortcuts

d21506a

haroon10725 changed the title ~~feat: generative AI instrument sample generation~~ proposal: generative AI instrument sample generation Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: generative AI instrument sample generation #4322

proposal: generative AI instrument sample generation #4322

haroon10725 commented Jan 30, 2025

pikurasa commented Jan 30, 2025

haroon10725 commented Jan 30, 2025 •

edited

Loading

haroon10725 commented Jan 31, 2025

walterbender commented Jan 31, 2025

haroon10725 commented Feb 1, 2025 •

edited

Loading

walterbender commented Feb 1, 2025

haroon10725 commented Feb 3, 2025 •

edited

Loading

walterbender commented Feb 3, 2025

haroon10725 commented Feb 4, 2025

haroon10725 commented Feb 4, 2025 •

edited

Loading

haroon10725 commented Feb 5, 2025

pikurasa commented Feb 5, 2025

haroon10725 commented Feb 9, 2025 •

edited

Loading

therealharshit commented Feb 9, 2025

pikurasa commented Feb 10, 2025

haroon10725 commented Feb 11, 2025

haroon10725 commented Feb 11, 2025

haroon10725 commented Feb 11, 2025

pikurasa commented Feb 11, 2025 •

edited

Loading

haroon10725 commented Feb 21, 2025 •

edited

Loading

haroon10725 commented Feb 23, 2025 •

edited

Loading

haroon10725 commented Feb 24, 2025 •

edited

Loading

walterbender commented Feb 24, 2025

proposal: generative AI instrument sample generation #4322

Are you sure you want to change the base?

proposal: generative AI instrument sample generation #4322

Conversation

haroon10725 commented Jan 30, 2025

pikurasa commented Jan 30, 2025

haroon10725 commented Jan 30, 2025 • edited Loading

haroon10725 commented Jan 31, 2025

walterbender commented Jan 31, 2025

haroon10725 commented Feb 1, 2025 • edited Loading

walterbender commented Feb 1, 2025

haroon10725 commented Feb 3, 2025 • edited Loading

walterbender commented Feb 3, 2025

haroon10725 commented Feb 4, 2025

haroon10725 commented Feb 4, 2025 • edited Loading

haroon10725 commented Feb 5, 2025

pikurasa commented Feb 5, 2025

haroon10725 commented Feb 9, 2025 • edited Loading

therealharshit commented Feb 9, 2025

pikurasa commented Feb 10, 2025

haroon10725 commented Feb 11, 2025

haroon10725 commented Feb 11, 2025

haroon10725 commented Feb 11, 2025

pikurasa commented Feb 11, 2025 • edited Loading

haroon10725 commented Feb 21, 2025 • edited Loading

haroon10725 commented Feb 23, 2025 • edited Loading

haroon10725 commented Feb 24, 2025 • edited Loading

walterbender commented Feb 24, 2025

haroon10725 commented Jan 30, 2025 •

edited

Loading

haroon10725 commented Feb 1, 2025 •

edited

Loading

haroon10725 commented Feb 3, 2025 •

edited

Loading

haroon10725 commented Feb 4, 2025 •

edited

Loading

haroon10725 commented Feb 9, 2025 •

edited

Loading

pikurasa commented Feb 11, 2025 •

edited

Loading

haroon10725 commented Feb 21, 2025 •

edited

Loading

haroon10725 commented Feb 23, 2025 •

edited

Loading

haroon10725 commented Feb 24, 2025 •

edited

Loading