A rough roadmap for implementing new features. All is subject to change at a moment's notice.
- Training using pytorch-lightning, with suppport for fp16 and Colab TPUs.
- Training a GPT-2 model from scratch w/ parametricized context window sizes and parameters
- PyTorch support for training/generating
- Generation from Transformer's native generate() function
- Actual documentation
- Examples
- Training on a CPU
- Training on a GPU
- Training on multiple GPU (4x T4)
- Training on a TPU
- Cross-Training on Multiple Datasets
- Generate on a CPU
- Generate on a GPU
- API docs for all classes
- Examples