You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I just discovered this Flux (SD3 branch) parameter:
--timestep_sampling flux_shift
Previously I'd been using
--timestep_sampling shift
due to:
The README.md having --timestep_sampling shift in the example training command
The extensive diagrams showing the effect of changing the discrete_flow_shift value and the effect of different sigmoid values - none of which apply to the --timestep_sampling flux_shift mode. The space given to these modes in the README.md kind of implies (in my opinion) that these are the modes to use.
Almost zero casual references to flux_shift on this sd-scripts repository. Searching for 'flux_shift' in the GitHub search bar returns very few results.
But now I've stumbled across this mode and tried it, my sample images are immediately more realistic.
I don't think enough people are aware of this mode and how good it can be, so I'm mostly filing this issue to raise its profile. @kohya-ss , should the README.md be updated to make flux_shift be the default in the example training command for Flux?
@recris, we've been talking about how to avoid the vertical lines artifact, and I came across this thread where someone has a similar-looking horizontal line artifact, and claims he 'fixed' it by using flux_shift:
I have a (working) hypothesis on the cause for those annoying artifacts.
Basically those vertical (or horizontal) stripes are the result of "fried" latents - during training the network learned to generate extreme noise predictions (high variance), which cause the image latent to deviate from the expected value distribution, and when decoded it results in those white bands across the image. This learned behavior seems more prone to happen with high learning rates, although with low rates it still happens given enough training time.
Inspired by the research in #294 , I changed the loss function and added a regularization term to penalize the network from learning to predict noise that deviates too much from the normal distribution mean and variance.
In train_network.py:
# keep this low - increasing too much causes image sharpness and contrast to explode
dist_loss_weight = 0.03
(...)
loss = train_util.conditional_loss(...)
if dist_loss_weight > 0.0:
# penalise high noise timesteps more than low noise
ts = timesteps / 1000.0
w = ts * dist_loss_weight
# noise mean and variance per channel
n_var, n_mean = torch.var_mean(noise_pred.float(), dim=(2,3), correction=0)
n_logvar = torch.log(n_var)
# KL divergence loss to standard gaussian
kl_div_loss = -0.5 * torch.sum(1 + n_logvar - n_mean.pow(2) - n_logvar.exp(), dim=1)
dist_loss = w * kl_div_loss
loss = loss.mean([1, 2, 3])
if dist_loss is not None:
loss = loss + dist_loss
This approach seems to improve some of my test cases, I was able to use more "aggressive" huber loss settings with less side effects.
There are a bunch of assumptions being made here:
The ideal noise distribution is gaussian with mean=0 and variance=1 - I currently have no strong evidence for this
dist_loss_weight has a linear schedule - I've tried a few others, at least seems to work better than a constant schedule?
So I just discovered this Flux (SD3 branch) parameter:
Previously I'd been using
due to:
--timestep_sampling shift
in the example training commanddiscrete_flow_shift
value and the effect of differentsigmoid
values - none of which apply to the--timestep_sampling flux_shift
mode. The space given to these modes in the README.md kind of implies (in my opinion) that these are the modes to use.flux_shift
on this sd-scripts repository. Searching for 'flux_shift' in the GitHub search bar returns very few results.But now I've stumbled across this mode and tried it, my sample images are immediately more realistic.
I don't think enough people are aware of this mode and how good it can be, so I'm mostly filing this issue to raise its profile. @kohya-ss , should the README.md be updated to make
flux_shift
be the default in the example training command for Flux?@recris, we've been talking about how to avoid the vertical lines artifact, and I came across this thread where someone has a similar-looking horizontal line artifact, and claims he 'fixed' it by using
flux_shift
:#1948
Worth a try if you have some line artifacts and you currently aren't using this timestep sampling mode?
The text was updated successfully, but these errors were encountered: