In this repo you can find the Tiny Shakespeare dataset (input.txt) and 17 files each representing a step in training GPT-2 model from scratch. In files train_get2-1 through train_get2-8 we setup our code for training. From train_get2-9-speedup-1 thorugh speedup-9 we implement different techniques to speed up our training process.
Target - loss less than 0.099
- Epochs - 6000
- Batch size = 8
- Number of tokens = 1024
As seen in below image, after runnning for 6000 epohcs with batch size 8 and tokens 1024, we are able to get our loss to 0.06

Link to app - https://huggingface.co/spaces/HimankJ/GPT2_CustomTrained
