'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

araleza · 2025-02-27T22:42:35Z

So I just discovered this Flux (SD3 branch) parameter:

--timestep_sampling flux_shift

Previously I'd been using

--timestep_sampling shift

due to:

The README.md having --timestep_sampling shift in the example training command
The extensive diagrams showing the effect of changing the discrete_flow_shift value and the effect of different sigmoid values - none of which apply to the --timestep_sampling flux_shift mode. The space given to these modes in the README.md kind of implies (in my opinion) that these are the modes to use.
Almost zero casual references to flux_shift on this sd-scripts repository. Searching for 'flux_shift' in the GitHub search bar returns very few results.

But now I've stumbled across this mode and tried it, my sample images are immediately more realistic.

I don't think enough people are aware of this mode and how good it can be, so I'm mostly filing this issue to raise its profile. @kohya-ss , should the README.md be updated to make flux_shift be the default in the example training command for Flux?

@recris, we've been talking about how to avoid the vertical lines artifact, and I came across this thread where someone has a similar-looking horizontal line artifact, and claims he 'fixed' it by using flux_shift:

#1948

Worth a try if you have some line artifacts and you currently aren't using this timestep sampling mode?

The text was updated successfully, but these errors were encountered:

recris · 2025-02-28T11:43:02Z

I've been using flux_shift for a long time, it does not fully solve the vertical lines issue.

The only thing that seems to fully avoid the problem is to train with larger batch sizes and/or low learning rates.

recris · 2025-03-07T16:42:00Z

I have a (working) hypothesis on the cause for those annoying artifacts.

Basically those vertical (or horizontal) stripes are the result of "fried" latents - during training the network learned to generate extreme noise predictions (high variance), which cause the image latent to deviate from the expected value distribution, and when decoded it results in those white bands across the image. This learned behavior seems more prone to happen with high learning rates, although with low rates it still happens given enough training time.

Inspired by the research in #294 , I changed the loss function and added a regularization term to penalize the network from learning to predict noise that deviates too much from the normal distribution mean and variance.

In train_network.py:

# keep this low - increasing too much causes image sharpness and contrast to explode
dist_loss_weight = 0.03

(...)

loss = train_util.conditional_loss(...)

if dist_loss_weight > 0.0:
    # penalise high noise timesteps more than low noise
    ts = timesteps / 1000.0
    w = ts * dist_loss_weight

    # noise mean and variance per channel
    n_var, n_mean = torch.var_mean(noise_pred.float(), dim=(2,3), correction=0)
    n_logvar = torch.log(n_var)

    # KL divergence loss to standard gaussian
    kl_div_loss = -0.5 * torch.sum(1 + n_logvar - n_mean.pow(2) - n_logvar.exp(), dim=1)
    dist_loss = w * kl_div_loss

loss = loss.mean([1, 2, 3])

if dist_loss is not None:
    loss = loss + dist_loss

This approach seems to improve some of my test cases, I was able to use more "aggressive" huber loss settings with less side effects.

There are a bunch of assumptions being made here:

The ideal noise distribution is gaussian with mean=0 and variance=1 - I currently have no strong evidence for this
dist_loss_weight has a linear schedule - I've tried a few others, at least seems to work better than a constant schedule?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

araleza commented Feb 27, 2025 •

edited

Loading

recris commented Feb 28, 2025

recris commented Mar 7, 2025

'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

Comments

araleza commented Feb 27, 2025 • edited Loading

recris commented Feb 28, 2025

recris commented Mar 7, 2025

araleza commented Feb 27, 2025 •

edited

Loading