Flux: big jumps in key size seems to be due to min_snr_gamma not being hooked up #1980

araleza · 2025-03-11T18:25:43Z

While measuring sd-scripts' LoRA key lengths with Tensorboard, I noticed that from time to time there were big jumps.

The jumps seem to correspond to the noise timestep value for that step being >900. In the example below, the timestep value is 932. (I'm not talking about the training step number, which coincidentally happens to also be around 900 here). While looking into reducing the loss for high timestep values, I spotted that that discussion seems to have already taken place for Stable Diffusion and SDXL, with a pull request by @AI-Casanova for a feature called --min_snr_gamma here in March 2023:

#308 (comment)

(The implementation was later fixed by @drhead)

Flux does not seem to have this min-snr-gamma feature enabled, as the function that should call it is stubbed out (in flux_train_network.py):

    def post_process_loss(self, loss, args, timesteps, noise_scheduler):
        return loss

It would be nice to get an implementation of this working for Flux to stop the overly-large jumps in key sizes occuring when the timestep value is high. Flux does actually take the --min_snr_gamma parameter without complaint, but silently makes no use of the value that is set.

The text was updated successfully, but these errors were encountered:

drhead · 2025-03-11T18:31:32Z

I don't know the exact details, but I'm fairly certain you shouldn't be using min-snr-gamma on Flux anyways. Flux has its own system of weighted timestep sampling which should be used instead.

araleza · 2025-03-11T18:34:25Z

@drhead, are you maybe talking about --timestep_sampling flux_shift? I found out about that just a few weeks ago, and I have it in place on my command line. But I still see the big loss values for the high time steps. Unless these big steps are intentional and expected for Flux?

drhead · 2025-03-11T18:48:37Z

I have not trained anything on Flux, but larger loss values for some timesteps sounds normal for Flux's timestep sampling.

Min-snr-gamma is a timestep weighting strategy. It simply is a multiplier on loss values based on the timestep's noise level, and it proportionally decreases the impact of the gradients coming from that image (it reduces how much influence those have on the model). It does so with the objective of making the influence of each timestep more equal. As a side note, you shouldn't take the lowered loss you get from min-snr-gamma too seriously. You could simply multiply the loss values for all timesteps by 0.01, and that won't mean that your model is any better than before.

Flux uses a timestep sampling strategy. Instead of multiplying the loss values, it picks timesteps that are expected to produce higher loss values less often, with the same intention of equalizing the overall contributions. But it also means those timesteps will produce more impactful gradients, which might make things a bit less stable if you're using a low batch size.

If you're worried about training instability, you should use gradient accumulation to have a larger effective batch size. If your batch size is something like 1 or 2, then it's honestly not surprising that this would happen. But having a high loss timestep's gradient diluted among the gradients of about 16 or 32 different steps shouldn't create such a huge shock to the model.

araleza mentioned this issue Mar 11, 2025

flux_train_network.py not support min_snr_gamma #1962

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux: big jumps in key size seems to be due to min_snr_gamma not being hooked up #1980

Flux: big jumps in key size seems to be due to min_snr_gamma not being hooked up #1980

araleza commented Mar 11, 2025 •

edited

Loading

drhead commented Mar 11, 2025

araleza commented Mar 11, 2025

drhead commented Mar 11, 2025

Flux: big jumps in key size seems to be due to min_snr_gamma not being hooked up #1980

Flux: big jumps in key size seems to be due to min_snr_gamma not being hooked up #1980

Comments

araleza commented Mar 11, 2025 • edited Loading

drhead commented Mar 11, 2025

araleza commented Mar 11, 2025

drhead commented Mar 11, 2025

araleza commented Mar 11, 2025 •

edited

Loading