About | Models | Helpful Notebooks | Requirements | License | Author
My goal is to work my way through certain LLM models starting from Transformers to help me understand how each model works and builds from the previous models. Once I have a set of models I am interested in, the next focus will be on fine-tuning and optimizing the models to run on the cheapest hardware possible.
✅ Transformer
✅ GPT
✅ LLaMA
◻️ LLM Inference Optimization
◻️ In-flight Batching
◻️ Speculative inference
◻️ Key-Value Caching
◻️ PagedAttention
◻️ Pipeline Parallelism
◻️ Tensor Parallelism
◻️ Sequence Parallelism
◻️ Flash Attention
◻️ Quantization
◻️ Sparsity
◻️ Distillation
◻️
◻️
✅ Transformer Arithmetic
✅ [WIP] Transformer Scaling
◻️
Requirements for all the models are stored under a single requirements.txt
file.
This project is under license from MIT. For more details, see the LICENSE file.
Made with ❤️ by Mukesh Mithrakumar