Bug❓: --cache_latents prevents regular checkpoint saving in train_network.py 😥 #1937

Jackiiiii · 2025-02-15T19:47:00Z

When the --cache_latents flag is enabled, the training greatly benefits from a speed boost (since the latents are computed once and then reused / i think). However, with this flag active, the checkpoint saving mechanism (e.g. via --save_every_n_steps 10) is completely suppressed – no intermediate checkpoints are saved, and no related log messages appear.
When --cache_latents is disabled, checkpoints are saved normally, but the training speed drops drastically. In my case, on an RTX 3090 Ti, I must reduce the effective values to --train_batch_size 16 and --gradient_accumulation_steps 1, even though I would expect to use much higher values (for example, a batch size of 64 togather with --cache_latents). This makes training impractically slow.

Expected Behavior:

Training should both benefit from the speed improvements of latent caching (allowing for high batch sizes and fast processing) and save checkpoints at regular intervals (e.g., every 10 steps(it's low, to test, if it save)).

Actual Behavior:

With --cache_latents:

Fast training due to cached latents.
No checkpoints are saved (neither intermediate nor at the end).
Looks like my CPU is working too.

Without --cache_latents:

Checkpoints are saved, but the training is extremely slow. I must reduce the effective parameters to --train_batch_size 16 and --gradient_accumulation_steps 1.

System Information:

Operating System: Windows 11
GPU: NVIDIA RTX 3090 Ti
CPU: AMD Ryzen Threadripper 3960X 24-Core Processor
Python Version: 3.10.9
Additional Flags: The training is run with flags such as --xformers, --mixed_precision=fp16, and --gradient_checkpointing.

Any help or fixes would be greatly appreciated!

rockerBOO · 2025-02-16T20:05:06Z

Maybe try --cache_latents_to_disk but might be a bug either way

Jackiiiii · 2025-02-16T20:38:14Z

i already try --cache_latents_to_disk but makes nothing faster just worst but with --cache_latents ram and cpu is used too but does not save.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug❓: --cache_latents prevents regular checkpoint saving in train_network.py 😥 #1937

Bug❓: --cache_latents prevents regular checkpoint saving in train_network.py 😥 #1937

Jackiiiii commented Feb 15, 2025

rockerBOO commented Feb 16, 2025

Jackiiiii commented Feb 16, 2025

Bug❓: --cache_latents prevents regular checkpoint saving in train_network.py 😥 #1937

Bug❓: --cache_latents prevents regular checkpoint saving in train_network.py 😥 #1937

Comments

Jackiiiii commented Feb 15, 2025

rockerBOO commented Feb 16, 2025

Jackiiiii commented Feb 16, 2025