You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A script to plot effective warmup periods as a function of 𝛽₂, and warmup schedules over time.
32
+
26
33
## Usage
27
34
35
+
The [Documentation](https://tony-y.github.io/pytorch_warmup/) provides more detailed information on this library, unseen below.
36
+
28
37
### Sample Codes
29
38
30
39
The scheduled learning rate is dampened by the multiplication of the warmup factor:
@@ -34,16 +43,20 @@ The scheduled learning rate is dampened by the multiplication of the warmup fact
34
43
#### Approach 1
35
44
[](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_Approach1_chaining.ipynb)
36
45
37
-
When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used as follows:
46
+
When the learning rate schedule uses the global iteration number, the untuned linear warmup can be used
47
+
together with `Adam` or its variant (`AdamW`, `NAdam`, etc.) as follows:
# The warmup schedule initialization dampens the initial LR of the optimizer.
47
60
for epoch inrange(1,num_epochs+1):
48
61
for batch in dataloader:
49
62
optimizer.zero_grad()
@@ -53,9 +66,9 @@ for epoch in range(1,num_epochs+1):
53
66
with warmup_scheduler.dampening():
54
67
lr_scheduler.step()
55
68
```
56
-
Note that the warmup schedule must not be initialized before the learning rate schedule.
69
+
Note that the warmup schedule must not be initialized before the initialization of the learning rate schedule.
57
70
58
-
If you want to use the learning rate schedule "chaining" which is supported for PyTorch 1.4.0 or above, you may simply give a code of learning rate schedulers as a suite of the `with` statement:
71
+
If you want to use the learning rate schedule *chaining*, which is supported for PyTorch 1.4 or above, you may simply write a code of learning rate schedulers as a suite of the `with` statement:
The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. Please see the original paper for the details.
199
+
The warmup factor depends on Adam's `beta2` parameter for `RAdamWarmup`. For details please refer to the
200
+
[Documentation](https://tony-y.github.io/pytorch_warmup/radam_warmup.html) or
201
+
"[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265)."
187
202
188
203
```python
189
204
warmup_scheduler = warmup.RAdamWarmup(optimizer)
190
205
```
191
206
192
207
### Apex's Adam
193
208
194
-
The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used with the warmup schedulers. For example:
209
+
The Apex library provides an Adam optimizer tuned for CUDA devices, [FusedAdam](https://nvidia.github.io/apex/optimizers.html#apex.optimizers.FusedAdam). The FusedAdam optimizer can be used together with any one of the warmup schedules above. For example:
195
210
196
211
[](https://colab.research.google.com/github/Tony-Y/colab-notebooks/blob/master/PyTorch_Warmup_FusedAdam.ipynb)
0 commit comments