Fine Tuning #257
Replies: 1 comment
-
You bring up a good point. It's currently designed to run on a single-GPU only because multi-GPU requires code changes that then wouldn't work with single GPU or CPU. So the goal was to have code that works basically for the largest number of readers if that makes sense 😅. I had a prototype for multi-GPU pretraining somewhere. But I wanted to make this a bit nicer and extend that to the finetuning ones. I will probably do that some time in the future in the next few weeks because I want to make this nice and user-friendly. In the meantime, Appendix A (the last sections) describes on a general level how to modify the code for DDP. Regarding the error you are getting, I think this may be related to the context size or so. Maybe your data loader is loading inputs with a larger than 1024 context size. (Just speculating though.) |
Beta Was this translation helpful? Give feedback.
-
Is this code only made to be trained on 1 GPU? If I had access to multiple GPUs what would I have to change within the code, if anything, to make it work? I ask that because I referenced this code to make an LLM from scratch and got a checkpoint! However, now when I try to fine-tune the checkpoint on multiple GPUs with a relatively larger database I get a tensor mismatch. This mismatch doesn't exist when fine-tuning on less data, so I'm not sure what's going on.
Error:
RuntimeError: The expanded size of the tensor (1035) must match the existing size (1024) at non-singleton dimension 3. Target sizes: [1, 12, 1035, 1035]. Tensor sizes: [1024, 1024]
Beta Was this translation helpful? Give feedback.
All reactions