Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'timestep_sampling=flux_shift' can offer quality improvements, but barely mentioned in docs #1958

Open
araleza opened this issue Feb 27, 2025 · 2 comments

Comments

@araleza
Copy link

araleza commented Feb 27, 2025

So I just discovered this Flux (SD3 branch) parameter:

--timestep_sampling flux_shift

Previously I'd been using

--timestep_sampling shift

due to:

  1. The README.md having --timestep_sampling shift in the example training command
  2. The extensive diagrams showing the effect of changing the discrete_flow_shift value and the effect of different sigmoid values - none of which apply to the --timestep_sampling flux_shift mode. The space given to these modes in the README.md kind of implies (in my opinion) that these are the modes to use.
  3. Almost zero casual references to flux_shift on this sd-scripts repository. Searching for 'flux_shift' in the GitHub search bar returns very few results.

But now I've stumbled across this mode and tried it, my sample images are immediately more realistic.

I don't think enough people are aware of this mode and how good it can be, so I'm mostly filing this issue to raise its profile. @kohya-ss , should the README.md be updated to make flux_shift be the default in the example training command for Flux?

@recris, we've been talking about how to avoid the vertical lines artifact, and I came across this thread where someone has a similar-looking horizontal line artifact, and claims he 'fixed' it by using flux_shift:

#1948

Worth a try if you have some line artifacts and you currently aren't using this timestep sampling mode?

@recris
Copy link

recris commented Feb 28, 2025

I've been using flux_shift for a long time, it does not fully solve the vertical lines issue.

The only thing that seems to fully avoid the problem is to train with larger batch sizes and/or low learning rates.

@recris
Copy link

recris commented Mar 7, 2025

I have a (working) hypothesis on the cause for those annoying artifacts.

Basically those vertical (or horizontal) stripes are the result of "fried" latents - during training the network learned to generate extreme noise predictions (high variance), which cause the image latent to deviate from the expected value distribution, and when decoded it results in those white bands across the image. This learned behavior seems more prone to happen with high learning rates, although with low rates it still happens given enough training time.

Inspired by the research in #294 , I changed the loss function and added a regularization term to penalize the network from learning to predict noise that deviates too much from the normal distribution mean and variance.

In train_network.py:

# keep this low - increasing too much causes image sharpness and contrast to explode
dist_loss_weight = 0.03

(...)

loss = train_util.conditional_loss(...)

if dist_loss_weight > 0.0:
    # penalise high noise timesteps more than low noise
    ts = timesteps / 1000.0
    w = ts * dist_loss_weight

    # noise mean and variance per channel
    n_var, n_mean = torch.var_mean(noise_pred.float(), dim=(2,3), correction=0)
    n_logvar = torch.log(n_var)

    # KL divergence loss to standard gaussian
    kl_div_loss = -0.5 * torch.sum(1 + n_logvar - n_mean.pow(2) - n_logvar.exp(), dim=1)
    dist_loss = w * kl_div_loss

loss = loss.mean([1, 2, 3])

if dist_loss is not None:
    loss = loss + dist_loss

This approach seems to improve some of my test cases, I was able to use more "aggressive" huber loss settings with less side effects.

There are a bunch of assumptions being made here:

  • The ideal noise distribution is gaussian with mean=0 and variance=1 - I currently have no strong evidence for this
  • dist_loss_weight has a linear schedule - I've tried a few others, at least seems to work better than a constant schedule?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants