Update Documentation

Tony-Y · Tony-Y · commit 7e74ec758324 · 2024-10-10T15:47:08.000+09:00
diff --git a/.github/workflows/sphinx-gh-pages.yml b/.github/workflows/sphinx-gh-pages.yml
@@ -35,7 +35,7 @@ jobs:
         run: |
           python -m pip install --upgrade pip
           pip install torch --index-url https://download.pytorch.org/whl/cpu
-          pip install sphinx sphinxcontrib-katex sphinx-rtd-theme
+          pip install sphinx sphinxcontrib-katex sphinx-copybutton sphinx-rtd-theme
       - name: Sphinx Build
         run: |
           cd docs/
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2019 Takenori Yamamoto
+Copyright (c) 2019-2024 Takenori Yamamoto
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,5 +1,7 @@
 include test/__init__.py
+include examples/plots/README.md
 include examples/plots/*.py
 include examples/plots/figs/*.png
+include examples/emnist/README.md
 include examples/emnist/*.py
 include examples/emnist/figs/*.png
diff --git a/README.md b/README.md
@@ -4,27 +4,36 @@ This library contains PyTorch implementations of the warmup schedules described
 
 <p align="center"><img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/plots/figs/warmup_schedule.png" alt="Warmup schedule" width="400"/></p>
 
-![Python package](https://github.com/Tony-Y/pytorch_warmup/workflows/Python%20package/badge.svg)
+[![Python package](https://github.com/Tony-Y/pytorch_warmup/workflows/Python%20package/badge.svg)](https://github.com/Tony-Y/pytorch_warmup/)
 [![PyPI version shields.io](https://img.shields.io/pypi/v/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
-[![PyPI license](https://img.shields.io/pypi/l/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
-[![PyPI pyversions](https://img.shields.io/pypi/pyversions/pytorch-warmup.svg)](https://pypi.python.org/pypi/pytorch-warmup/)
+[![PyPI license](https://img.shields.io/pypi/l/pytorch-warmup.svg)](https://github.com/Tony-Y/pytorch_warmup/blob/master/LICENSE)
+[![Python versions](https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)](https://www.python.org)
 
 ## Installation
 
-Make sure you have Python 3.7+ and PyTorch 1.1+. Then, run the following command in the project directory:
+Make sure you have Python 3.7+ and PyTorch 1.1+ or 2.x. Then, run the following command in the project directory:
 
-```
+```shell
 python -m pip install .
 ```
 
 or install the latest version from the Python Package Index:
 
-```
+```shell
 pip install -U pytorch_warmup
 ```
 
+## Examples
+
+* [EMNIST](https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/emnist) -
+ A sample script to train a CNN model on the EMNIST dataset using the Adam algorithm with a warmup.
+* [Plots](https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/plots) -
+ A script to plot effective warmup periods as a function of 𝛽₂, and warmup schedules over time.
+
 ## Usage
 
+The [Documentation](https://tony-y.github.io/pytorch_warmup/) provides more detailed information on this library, unseen below. 
+
 ### Sample Codes
 
 The scheduled learning rate is dampened by the multiplication of the warmup factor:
@@ -34,16 +43,20 @@ The scheduled learning rate is dampened by the multiplication of the warmup fact
 #### Approach 1
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach1_chaining.ipynb)
 
-When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used as follows:
+When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used
+together with `Adam` or its variant (`AdamW`, `NAdam`, etc.) as follows:
 
 ```python
 import torch
 import pytorch_warmup as warmup
 
 optimizer = torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
+    # This sample code uses the AdamW optimizer.
 num_steps = len(dataloader) * num_epochs
 lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
+    # The LR schedule initialization resets the initial LR of the optimizer.
 warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
+    # The warmup schedule initialization dampens the initial LR of the optimizer.
 for epoch in range(1,num_epochs+1):
     for batch in dataloader:
         optimizer.zero_grad()
@@ -53,9 +66,9 @@ for epoch in range(1,num_epochs+1):
         with warmup_scheduler.dampening():
             lr_scheduler.step()
 ```
-Note that the warmup schedule must not be initialized before the learning rate schedule.
+Note that the warmup schedule must not be initialized before the initialization of the learning rate schedule.
 
-If you want to use the learning rate schedule "chaining" which is supported for PyTorch 1.4.0 or above, you may simply give a code of learning rate schedulers as a suite of the `with` statement:
+If you want to use the learning rate schedule *chaining*, which is supported for PyTorch 1.4 or above, you may simply write a code of learning rate schedulers as a suite of the `with` statement:
 ```python
 lr_scheduler1 = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
 lr_scheduler2 = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
@@ -163,7 +176,7 @@ warmup_scheduler = warmup.ExponentialWarmup(optimizer, warmup_period=1000)
 
 #### Untuned Warmup
 
-The warmup period is given by a function of Adam's `beta2` parameter for `UntunedLinearWarmup` and `UntunedExponentialWarmup`.
+The warmup period is determined by a function of Adam's `beta2` parameter for `UntunedLinearWarmup` and `UntunedExponentialWarmup`.
 
 ##### Linear
 
@@ -183,15 +196,17 @@ warmup_scheduler = warmup.UntunedExponentialWarmup(optimizer)
 
 #### RAdam Warmup
 
-The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. Please see the original paper for the details.
+The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. For details please refer to the
+[Documentation](https://tony-y.github.io/pytorch_warmup/radam_warmup.html) or
+"[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265)."
 
 ```python
 warmup_scheduler = warmup.RAdamWarmup(optimizer)
 ```
 
 ### Apex's Adam
 
-The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used with the warmup schedulers. For example:
+The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used together with any one of the warmup schedules above. For example:
 
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_FusedAdam.ipynb)
 
@@ -206,4 +221,4 @@ warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
 
 MIT License
 
-Copyright (c) 2019 Takenori Yamamoto
+© 2019-2024 Takenori Yamamoto
diff --git a/docs/conf.py b/docs/conf.py
@@ -21,7 +21,7 @@
 # -- Project information -----------------------------------------------------
 
 project = 'PyTorch Warmup'
-copyright = '2019, Takenori Yamamoto'
+copyright = '2019-2024, Takenori Yamamoto'
 author = 'Takenori Yamamoto'
 
 
@@ -40,6 +40,7 @@
     'sphinx.ext.napoleon',
     'sphinx.ext.viewcode',
     'sphinxcontrib.katex',
+    'sphinx_copybutton',
 ]
 
 # Add any paths that contain templates here, relative to this directory.
@@ -61,4 +62,8 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-#html_static_path = ['_static']
+# html_static_path = ['_static']
+
+# Copybutton settings
+copybutton_prompt_text = r">>> |\.\.\. |\$ "
+copybutton_prompt_is_regexp = True
diff --git a/docs/index.rst b/docs/index.rst
@@ -10,6 +10,79 @@ This library contains PyTorch implementations of the warmup schedules described
 `On the adequacy of untuned warmup for adaptive optimization
 <https://arxiv.org/abs/1910.04209>`_.
 
+.. image:: https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/plots/figs/warmup_schedule.png
+   :alt: Warmup schedule
+   :width: 400
+   :align: center
+
+.. image:: https://github.com/Tony-Y/pytorch_warmup/workflows/Python%20package/badge.svg
+   :alt: Python package
+   :target: https://github.com/Tony-Y/pytorch_warmup/
+
+.. image:: https://img.shields.io/pypi/v/pytorch-warmup.svg
+   :alt: PyPI version shields.io
+   :target: https://pypi.python.org/pypi/pytorch-warmup/
+
+.. image:: https://img.shields.io/pypi/l/pytorch-warmup.svg
+   :alt: PyPI license
+   :target: https://github.com/Tony-Y/pytorch_warmup/blob/master/LICENSE
+
+.. image:: https://img.shields.io/badge/python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue
+   :alt: Python versions
+   :target: https://www.python.org
+
+Installation
+------------
+
+Make sure you have Python 3.7+ and PyTorch 1.1+ or 2.x. Then, install the latest version from the Python Package Index:
+
+.. code-block:: shell
+
+   pip install -U pytorch_warmup
+
+Examples
+--------
+
+.. image:: https://colab.research.google.com/assets/colab-badge.svg
+   :alt: Open In Colab
+   :target: https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach1_chaining.ipynb
+   
+* `EMNIST <https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/emnist>`_ -
+  A sample script to train a CNN model on the EMNIST dataset using the Adam algorithm with a warmup.
+
+* `Plots <https://github.com/Tony-Y/pytorch_warmup/tree/master/examples/plots>`_ -
+  A script to plot effective warmup periods as a function of :math:`\beta_{2}`, and warmup schedules over time.
+
+Usage
+-----
+
+When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used
+together with :class:`Adam` or its variant (:class:`AdamW`, :class:`NAdam`, etc.) as follows:
+
+.. code-block:: python
+
+   import torch
+   import pytorch_warmup as warmup
+
+   optimizer = torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), weight_decay=0.01)
+      # This sample code uses the AdamW optimizer.
+   num_steps = len(dataloader) * num_epochs
+   lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_steps)
+      # The LR schedule initialization resets the initial LR of the optimizer.
+   warmup_scheduler = warmup.UntunedLinearWarmup(optimizer)
+      # The warmup schedule initialization dampens the initial LR of the optimizer.
+   for epoch in range(1,num_epochs+1):
+      for batch in dataloader:
+         optimizer.zero_grad()
+         loss = ...
+         loss.backward()
+         optimizer.step()
+         with warmup_scheduler.dampening():
+               lr_scheduler.step()
+
+Note that the warmup schedule must not be initialized before the initialization of the learning rate schedule.
+Other approaches can be found in `README <https://github.com/Tony-Y/pytorch_warmup?tab=readme-ov-file#usage>`_.
+
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
diff --git a/examples/emnist/README.md b/examples/emnist/README.md
@@ -0,0 +1,93 @@
+# EMNIST Example
+
+Requirements: `pytorch_warmup` and `torchvision`.
+
+<p align="center">
+  <img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/emnist/figs/accuracy.png" alt="Accuracy" width="400"/></br>
+  <i>Test accuracy over time for each warmup schedule.</i>
+</p>
+
+<p align="center">
+  <img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/emnist/figs/learning_rate.png" alt="Accuracy" width="400"/></br>
+  <i>Learning rate over time for each warmup schedule.</i>
+</p>
+
+## Download EMNIST Dataset
+
+Run the Python script `download.py` to download the EMNIST dataset:
+
+```shell
+python download.py
+```
+
+This script shows download progress:
+
+```
+Downloading zip archive
+Downloading https://biometrics.nist.gov/cs_links/EMNIST/gzip.zip to .data/EMNIST/raw/gzip.zip
+100.0%
+```
+
+## Train A CNN Model
+
+Run the Python script `main.py` to train a CNN model on the EMNIST dataset using the Adam algorithm.
+
+### Untuned Linear Warmup
+
+Train a CNN model with the *Untuned Linear Warmup* schedule:
+
+```
+python main.py --warmup linear
+```
+
+### Untuned Exponential Warmup
+
+Train a CNN model with the *Untuned Exponential Warmup* schedule:
+
+```
+python main.py --warmup exponential
+```
+
+### RAdam Warmup
+
+Train a CNN model with the *RAdam Warmup* schedule:
+
+```
+python main.py --warmup radam
+```
+
+### No Warmup
+
+Train a CNN model without warmup:
+
+```
+python main.py --warmup none
+```
+
+### Usage
+
+```
+usage: main.py [-h] [--batch-size N] [--test-batch-size N] [--epochs N] [--lr LR]
+               [--lr-min LM] [--wd WD] [--beta2 B2] [--no-cuda] [--seed S]
+               [--log-interval N] [--warmup {linear,exponential,radam,none}] [--save-model]
+
+PyTorch EMNIST Example
+
+options:
+  -h, --help            show this help message and exit
+  --batch-size N        input batch size for training (default: 64)
+  --test-batch-size N   input batch size for testing (default: 1000)
+  --epochs N            number of epochs to train (default: 10)
+  --lr LR               base learning rate (default: 0.01)
+  --lr-min LM           minimum learning rate (default: 1e-5)
+  --wd WD               weight decay (default: 0.01)
+  --beta2 B2            Adam's beta2 parameter (default: 0.999)
+  --no-cuda             disables CUDA training
+  --seed S              random seed (default: 1)
+  --log-interval N      how many batches to wait before logging training status
+  --warmup {linear,exponential,radam,none}
+                        warmup schedule
+  --save-model          For Saving the current Model
+```
+
+© 2024 Takenori Yamamoto
diff --git a/examples/plots/README.md b/examples/plots/README.md
@@ -0,0 +1,57 @@
+# Plots
+
+Requirements: `pytorch_warmup` and `matplotlib`. 
+
+## Effective Warmup Period
+
+<p align="center">
+  <img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/plots/figs/warmup_period.png" alt="Warmup period" width="400"/></br>
+  <i>Effective warmup periods of RAdam and rule-of-thumb warmup schedules, as a function of 𝛽₂.</i>
+</p>
+
+Run the Python script `effective_warmup_period.py` to show up the figure above:
+
+```shell
+python effective_warmup_period.py
+```
+
+### Usage
+
+```
+usage: effective_warmup_period.py [-h] [--output {none,png,pdf}]
+
+Effective warmup period
+
+options:
+  -h, --help            show this help message and exit
+  --output {none,png,pdf}
+                        Output file type (default: none)
+```
+
+## Warmup Schedule
+
+<p align="center">
+  <img src="https://github.com/Tony-Y/pytorch_warmup/raw/master/examples/plots/figs/warmup_schedule.png" alt="Warmup schedule" width="400"/></br>
+  <i>RAdam and rule-of-thumb warmup schedules over time for 𝛽₂ = 0.999.</i>
+</p>
+
+Run the Python script `warmup_schedule.py` to show up the figure above:
+
+```shell
+python warmup_schedule.py
+```
+
+### Usage
+
+```
+usage: warmup_schedule.py [-h] [--output {none,png,pdf}]
+
+Warmup schedule
+
+options:
+  -h, --help            show this help message and exit
+  --output {none,png,pdf}
+                        Output file type (default: none)
+```
+
+© 2024 Takenori Yamamoto
diff --git a/pytorch_warmup/base.py b/pytorch_warmup/base.py
diff --git a/pytorch_warmup/radam.py b/pytorch_warmup/radam.py
diff --git a/pytorch_warmup/untuned.py b/pytorch_warmup/untuned.py