Fine Tuning #257

Singhsar13308 · 2024-07-02T16:49:34Z

Singhsar13308
Jul 2, 2024

Is this code only made to be trained on 1 GPU? If I had access to multiple GPUs what would I have to change within the code, if anything, to make it work? I ask that because I referenced this code to make an LLM from scratch and got a checkpoint! However, now when I try to fine-tune the checkpoint on multiple GPUs with a relatively larger database I get a tensor mismatch. This mismatch doesn't exist when fine-tuning on less data, so I'm not sure what's going on.

Error:
RuntimeError: The expanded size of the tensor (1035) must match the existing size (1024) at non-singleton dimension 3. Target sizes: [1, 12, 1035, 1035]. Tensor sizes: [1024, 1024]

rasbt · 2024-07-02T17:03:01Z

rasbt
Jul 2, 2024
Maintainer

You bring up a good point. It's currently designed to run on a single-GPU only because multi-GPU requires code changes that then wouldn't work with single GPU or CPU. So the goal was to have code that works basically for the largest number of readers if that makes sense 😅.

I had a prototype for multi-GPU pretraining somewhere. But I wanted to make this a bit nicer and extend that to the finetuning ones. I will probably do that some time in the future in the next few weeks because I want to make this nice and user-friendly. In the meantime, Appendix A (the last sections) describes on a general level how to modify the code for DDP.

Regarding the error you are getting, I think this may be related to the context size or so. Maybe your data loader is loading inputs with a larger than 1024 context size. (Just speculating though.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine Tuning #257

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Fine Tuning #257

Singhsar13308 Jul 2, 2024

Replies: 1 comment

rasbt Jul 2, 2024 Maintainer

Singhsar13308
Jul 2, 2024

rasbt
Jul 2, 2024
Maintainer