-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primed audio samples from notebook #40
Comments
I've created a modified notebook that adds support for primed audio by supplying an audio file in your Google Drive. Feel free to make a copy and try it out for yourself. I can make a pull request to add this to the main repository as well, if there is interest. |
@SMarioMan thanks! |
@SMarioMan Thanks a million for this. Really came in handy. Do you know possibly what modifications would need to be made to train a model on multiple audio samples to see what the output would be based on that? ie if I wanted to train it on a few songs by the same artist? Apologies if this is a rather simple question. I've worked with Stylegan in the past to create images and that was just setting the training to a directory and it would iterate through all images within. |
You should be able to provide a comma separated list of audio files instead of just one. I haven't tried that configuration myself, so you might need to put some effort in if it doesn't just work. You can also provide different lyrics, genre, and artist information for each sample by modifying the metas array to contain multiple dicts instead of repeating the same one for all samples. |
@SMarioMan might you know how to solve #64 vis-a-vis your Colab notebook (referenced above)? I've been wondering the same question about how to reload a previously interrupted "level 2" or "level 1" session that now needs further upsampling and I've been using your (most excellent) notebook to experiment with jukebox. Many thanks for your contributions here! |
@kcrosley-leisurelabs I have been working on integrating checkpoints for upsampling support within the notebook using code from #42. I haven't fully tested it yet, but if I get it working, I'll share it. |
@SMarioMan, that would be terrific (and I'd find it very instructive). |
@SMarioMan that is super dope thanks. I'm using colab and building a new ubuntu box to try to make as similar env as the colab runtime, on paid plan, but I keep getting session drops anyway :( |
@diffractometer yeah, I experience similar issues at times (using Colab Pro), hence the ask. Just from an experimentation workflow perspective, it'd be a lot easier to generate Level 2 pieces and then choose only the most interesting to fully render at some other time. |
@kcrosley-leisurelabs would that be like the example but skip the first level 1 phase? Let it render for a long time then iterate? I think I understand... |
@diffractometer, I just mean that you can tell at Level 2 whether a track is worth even bothering to render fully-upsampled, but given how diverse the output is from jukebox, I'd rather spend my time crate digging at Level 2 and then going back and deciding which ones are worth upsampling later (as longer tracks take so durn long to upsample). |
@SMarioMan Thank you so much for making this. You solved a problem I was having :) I'm not sure if this is possible but is it possible to show an ETA for inference? I'm currently on the last level (level 0) and it's at |
Level 1 -> level 0 is exactly 4x the tokens and takes almost exactly 4x as long in my somewhat limited experience with a GPU with almost 50% headroom. So just multiply the last token count from level 1 to see where you are in the final level. |
I have created PR #72 with my latest changes, now including checkpoints. Hopefully it gets merged, but if it doesn't you can use the modified notebook at https://colab.research.google.com/github/SMarioMan/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb |
Awesome. Can’t wait to check that out, @SMarioMan! Thanks! |
Howdy,, @SMarioMan! Hey, whenever I try to continue from a checkpoint (this is using primed mode), I always get:
In the cell below "This next cell will take a while (approximately 10 minutes per 20 seconds of music sample)". Am I running a previous cell that I should not? (In case it's not clear: I have, for example, a previously computed Layer 2 that was made using a prompt. Now, sometime later, I am trying to upsample from that checkpoint. But when I reconnect and run thru the notebook again [by executing cells including the "# Identify the lowest level generated and continue from there." cell], I get the above error. Note that my total length, hps folder, and priming file/length settings are all same as before. Feel like I'm missing something, but don't know what it might be! Thanks for any help you can provide. Thanks, |
@kcrosley-leisurelabs Thanks for identifying this issue. It's nothing wrong on your end. load_codes() wasn't designed to handle a separate top prior, and I missed that. I'll push a fix soon. |
Thx, @SMarioMan! Appreciate it! |
BTW, your efforts are enabling high art like this: https://www.facebook.com/LeisureAddicts/videos/3196359047050228/ 🔥📼 thanks for your service! |
@kcrosley-leisurelabs The issue should be fixed at this stage. |
@kcrosley-leisurelabs 😮 omg |
@kcrosley-leisurelabs did you render on the 1b or 5b if you don't mind my asking. |
Hey @diffractometer: That's the 5b model. (Honestly, I've not gotten anything useful out of the 1b, though I've not experimented with prompting it.) The finished track there (as I think you can tell) has been significantly remastered from the raw output. I've come up with a pretty useful starting place for cleaning up artifacts and re-balancing the mixes. It's kind of an expensive signal chain, but it generally goes like this: On the track itself: Zynaptiq Unveil (remove smeary reverb-like artifacts and mud) > Zynaptiq Unmix:Drums (generally to bring volume of percussive elements UP) > sometimes an Exciter type plugin to put some high freq content back in An alternative here is some pretty aggressive EQ (reduce lower mids to clean out mud), accentuate upper mid vocal range, boost high end. (But this is nowhere near the magic of the Zynaptic stuff.) In the example I linked above, I restored removed reverb by an automated send to a period-appropriate reverb (UAD's version of Lexicon 224). There's also a reverberated ping-pong delay being fed at appropriate dramatic moments. Then, on the Master bus: Cosmos (stereo widener / exciter / bass booster) > UAD tape simulator > UAD Precision Multiband (dynamics - mostly accentuate vocals, sub bass, and control sibilance caused by the high-freq boost previously) > UAD Precision Maximizer (limiting/volume maximization) And after all that, you kinda sorta get back to a good-sounding record! ;) |
In my experience so far, the best way to interact with 1b is via co-composition. It seems that the 1b model can generate a lot of good ideas mixed in with a whole lot of bad ones, and it's not so great at picking out the good ideas by itself. |
@kcrosley-leisurelabs this is so dope. I keep getting timeouts on colab, I even upgraded to the paid plan but no bueno. Any suggestions on how to keep it running? I'm going to render something then run it thru my 8 track tape machine once I get it to work. |
@SMarioMan Is Compared to running the example in the readme, there's less influence of the primed input with colab. |
@camjac251 I believe if you set |
And, indeed, @SMarioMan, it is! Successfully continuing upsampling right now from a Level 1 checkpoint. This is great. Thanks for your help! |
Hey @diffractometer: It's the tendency for disconnection that has me psyched about @SMarioMan's latest update (which makes upsampling from a previous checkpoint work properly now 🎉). Because it takes a very long time for (for example) 90 seconds of audio to generate all the way from start, thru level 2, through upsampling to Level 1 and then Level 0, going all the way in one pass is pretty rare. (I have quite a few interesting things that got stuck at Level 2 or Level 1 that I can now continue on with.) Some notes and things that are helpful:
If it is, you'll see it like this and you can double click a running session to reconnect to it:
Hope perhaps some of the above helps? BTW, I see now that I'm experimenting with the continue from checkpoint (upsample) mode that the state is preserved -- e.g., if you continue upsampling a thing that crapped out during Level 0, you will continue right from where you left off, rather than from the last completed level. This is pretty exciting. -K- Edit: Here's another re-mastered Level 0 render. Here, I'm priming from the "Be quiet..." break from 10cc's "I'm Not in Love", rendering as artist Sarah McLachlan and genre "ambient". You can compare the raw Level 0 wav output from jukebox with the remastered version:
Given the genre here, you could easily prefer the original to the remaster as it really comes down to personal preferences and the vocal here isn't as buried as it is in more busy pop-type mixes. Similar to GPT-2, I think that priming the engine is the most interesting way to use jukebox. |
@kcrosley-leisurelabs super, I really appreciate the update and tips, this is awesome. Listened to the remaster, bananas! I'll try it and get back at ya soon, thanks again. |
@SMarioMan do checkpoints also work with Co-Composer upcycling? I've given your modified notebook a try with one of my zs-top-level-final.t files, but unfortunately after timing out I see no checkpoint files or anything. |
@Flesco Currently, the only change I have made to co-composing has been to save outputs in the Google Drive.I haven't experimented with any of the co-composer features myself, even to know whether or not it provides or uses checkpoints. |
@anlexmatos doooope. Thanks! Checking it out now. |
re: keeping colab from terminating your session by clicking Save: Chrome reserves the right to stealthily terminate a tab's javascript execution if you haven't used it for a while. But this doesn't happen if the tab is in front. At least on Chromium/Ubuntu a nice workaround is to keep colab as a dedicated window with no other tabs, and minimize or ignore it. |
has anyone managed to train new data through the colab notebook? |
First of all, I love you. Second, I've been playing with your colab and I'm really interested in generate variations from a song of mine, but what I get is like 30 seconds of my song and then turns into a different song. I would expect something like a continuous generation (like this video https://www.youtube.com/watch?v=iJgNpm8cTE8 ) Am I doing something wrong? Thanks mate!! |
@svntv, for continuations, simply follow the instructions in @SMarioMan's notebook. To do a continuation, upload a small sample truncated at an appropriate point (use an audio editor to make note of the time at which a certain event happens -- such as a downbeat or bar break so you don't end up with shitty/random examples like the ones from OpenAI (who seem to have zero knowledge of music in any meaningful sense -- kinda like their transformer... but I digress). Here's how you do continuations: Here's an example notebook that generates continuations from a certain classic Thomas Dolby track: https://colab.research.google.com/drive/1ssiZw58aU2km3cWN183v4IC9KYmPTlMS?usp=sharing Here's the cell where we'll set the priming. We run this cell instead of the one above it to put jukebox into primed mode: You would change the location of your priming Note the And now we set our total sample length. In the example, I've selected 90 seconds, which will give us a clip with a total length of 90 seconds (12 seconds from the primer and 78 seconds of "new" material that continues from the cutoff point). Below that, we set the artist and genre IDs. These can be any of the artist or genre IDs found in the "V2" versions of the artist and genre files (if you are using the 5B model as assumed in this example). IMPORTANT: Your choice of genre and artist are extremely important to the final output. If you provide choices that are unknown, malformed or misunderstood by the model, they will default to "unknown" and/or "various artists" and you'll get some pretty random results. If -- as we've done here -- you provide an artist that is unknown ("thomas dolby" is NOT in the V2 artist list), you'll see a (non-fatal) error and our artist is "unknown". This has a tendency to produce (in most cases) output that does not sound at all like your target artist (particularly in vocal style) but also in musical style. I say "in most cases" because sometimes (as in this example) you'll stumble upon tracks/artists that are pretty obviously represented in the training corpus even though they may not have their own artist IDs. Listen to the final output example I've provided for this one and I think you'll agree that there's no possible way that jukebox hasn't seen "She Blinded Me with Science" and other Thomas Dolby tracks. (The result I've cherry-picked here is a practical pastiche of Dolby-isms, particularly in the vocal processing. In just this one example, I hear echoes of a bunch of Dolby songs, not the least of which is Hyperactive. Also interesting: at the very end of the track is an almost perfectly isolate LINNdrum sample.) HOWEVER, jukebox has not been trained on you (unless you've done that... and if you haven't and want to know how to do it, I'm not the guy to ask). Also, we don't know what genre you're operating in and, further, how close your particular primer track might "fit" the selected genre you've declared. _(Aside: Also, we don't really know anything about the V2 genres, which are horribly implemented. Rather than being really specific (as in the V3 genres, which go with the 1B model, not the 5B model) they are very broad (e.g., in V3 we have "bossa nova", but where are the bossa nova tracks in terms of V2? Jazz? mpb? who knows?). ... and, further, some of them have to be constructed from parts that we are not sure go together. For example, what's the correct nomenclature for the genre "R&B" (rhythm and blues), is it "r n b"? "rnb"? I've not tried that one, but look at the genre's list -- there's a "r" genre a "b" genre and a "n" genre. It's a serious bucket of what the ever loving fuck without any documentation. I can tell you from experimentation and from the jukebox samples site that "rock n roll" is an accepted genre (presumably early rock, not to be confused with "rock" which is also a genre). Also "new wave" works. It also seems that "nu metal" is an acceptable genre, but is "nu jazz"? (I suspect not, but maybe.) Anyway, that's all pretty stupid. I wish the V2 genres were as so cleanly defined as the V3 ones.)_ If (like me) you make your own "serving suggestions" as a starter, note that things can diverge quite quickly from your original composition -- especially if the chosen artist's oeuvre isn't much like the genre you've picked. Here, for example, is a batch of 3 tracks created with this seed (about 22 seconds of a quickly "remixed" version of the Beach Boys classic, "God Only Knows") with the artist set to
ABOUT THAT YOUTUBE VIDEO: What's being shown there is simply a bunch of the "Never Gonna Give You Up" continuations published at https://jukebox.openai.com/, played one after another. It's not a very very very long output from a single very very very long jukebox session. It's just a bunch of those 70-ish second examples placed end to end. Don't misunderstand that. As for why this continuation works so well with the Rick Astley song (much like it works so well in my Thomas Dolby example): First, it's obvious that jukebox has seen this song and knows it (duh). Further, the 11 second-ish primer they use has enough lyrical content (which start from the very beginning of the song and match up perfectly because the lyrics they trained on are from LyricsWiki and that's what's also used here in their "lyrics" parameter). Also, the genre isn't a mis-match (while OpenAI has not told us what the training corpus contains and how they categorized any specific tracks) it's pretty clear they lumped Rick here into "pop". Anyway, I hope some of the previous info helps you! Best Regards, APPENDIX:
|
It's also possible to do continuations in co-composition mode, which is what my post that @svntv quoted was referencing. My changes have now been merged into SMarioMan's repository, so it's no longer necessary to use my version of the notebook in order to get this functionality. Anyways, good info above from kcrosley and it's pretty much all just as applicable when doing continuations via co-composition. Especially the advice about the importance of genre, artist and the audio prompt that is used. If you're using co-composition, don't be shy about running several batches before choosing a snippet of output to build upon. I've found that jukebox's creativity can be surprisingly diverse, and you'll often find several distinctly different directions it might think about taking things at any given point in the song that work well. Usually the majority of its ideas are not so great, in my experience, but just as much I always seem to find some interesting gem mixed in among all the questionable continuations. Based on that apparent signal to noise ratio, I'm reluctant to invest much time outside of co-composition and would rather explore the breadth of jukebox's thinking in order to get more "optimal" output every step of the way, four seconds at a time. |
Wow!! First of all thanks for your extended and detailed guide. I'm really really thankful for your dedication.
Thanks. I think it can be a good composition tool for inspiration. @kcrosley-leisurelabs @anlexmatos What would you recommend to do if I'm working mainly with instrumental tracks? Is there a parameter that can be set or maybe put an empty lyrics or somenthing? Thanks again! |
Hey @kcrosley-leisurelabs, could you clarify this? Does the state preservation only apply if you're able to reconnect to your active session? From my own experience, if the session itself gets terminated, you have no choice but to start over from the data.pth.tar of the last completed level. |
@Beatfox That's my understanding as well. It resumes from the data.pth.tar. There's no reason you couldn't modify the code to checkpoint more frequently. This would allow you to have partial levels, but note that my current CoLab document assumes the levels are complete if they exist. |
(Disclaimer: I'm a real beginner/layman at this type of stuff) So I was upsampling and it got past level 1 and was actually a couple hours into level 0. However, my internet went down before it could do it and I ended up losing all progress. How can I upsample from the "co_composer\level_1" files I'd downloaded so I don't lose progress? Is it possible? |
There are checkpoints, but they fail a lot. They tend to work a lot better
if you give Google $10 a month ;)
…On Fri, Sep 11, 2020 at 5:51 PM SirCommoner ***@***.***> wrote:
(Disclaimer: I'm a real beginner/layman at this type of stuff) So I was
upsampling and it got past level 1 and was actually a couple hours into
level 0. However, my internet went down before it could do it and I ended
up losing all progress. How can I upsample from the "co_composer\level_1"
files I'd downloaded so I don't lose progress? Is it possible?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHJJRX5DOPSNXVOWPJXRS3SFKSWDANCNFSM4M2E7ISQ>
.
|
But how exactly do I upload my level_1 file to be upsampled? Do I replace the level_1 folder then run the upsampling cell? Do I have to generate a whole other thing and then replace the folder? Do I just replace some other file? Etc |
From the notebook, how would you change the mode from an ancestral_sample starting point to a primed audio file list? I tried adding these to the
Hyperparams()
hps instance, but it wasn't effective.The text was updated successfully, but these errors were encountered: