You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the --cache_latents flag is enabled, the training greatly benefits from a speed boost (since the latents are computed once and then reused / i think). However, with this flag active, the checkpoint saving mechanism (e.g. via --save_every_n_steps 10) is completely suppressed – no intermediate checkpoints are saved, and no related log messages appear.
When --cache_latents is disabled, checkpoints are saved normally, but the training speed drops drastically. In my case, on an RTX 3090 Ti, I must reduce the effective values to --train_batch_size 16 and --gradient_accumulation_steps 1, even though I would expect to use much higher values (for example, a batch size of 64 togather with --cache_latents). This makes training impractically slow.
Expected Behavior:
Training should both benefit from the speed improvements of latent caching (allowing for high batch sizes and fast processing) and save checkpoints at regular intervals (e.g., every 10 steps(it's low, to test, if it save)).
Actual Behavior:
With --cache_latents:
Fast training due to cached latents.
No checkpoints are saved (neither intermediate nor at the end).
Looks like my CPU is working too.
Without --cache_latents:
Checkpoints are saved, but the training is extremely slow. I must reduce the effective parameters to --train_batch_size 16 and --gradient_accumulation_steps 1.
When the --cache_latents flag is enabled, the training greatly benefits from a speed boost (since the latents are computed once and then reused / i think). However, with this flag active, the checkpoint saving mechanism (e.g. via --save_every_n_steps 10) is completely suppressed – no intermediate checkpoints are saved, and no related log messages appear.
When --cache_latents is disabled, checkpoints are saved normally, but the training speed drops drastically. In my case, on an RTX 3090 Ti, I must reduce the effective values to --train_batch_size 16 and --gradient_accumulation_steps 1, even though I would expect to use much higher values (for example, a batch size of 64 togather with --cache_latents). This makes training impractically slow.
Expected Behavior:
Actual Behavior:
System Information:
Any help or fixes would be greatly appreciated!
The text was updated successfully, but these errors were encountered: