Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

native_guidance_scale parameter for LCMs in StableDiffusionXLPipeline #6993

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ivanprado
Copy link
Contributor

@ivanprado ivanprado commented Feb 16, 2024

What does this PR do?

Distilled LCMs don't perform regular class-free guidance. Instead, they pass the guidance_scale as conditioning to the U-Net. This is cool because it reduces the computing required by 2x, given that the negative prediction is not required.

But in practice, we have seen that being able to also perform regular classifier-free guidance in addition to the conditional guidance_scale can be useful:

  • It allows to use negative prompt again.
  • It provides better quality/prompt adherence in some cases.

This PR introduces a new parameter, native_guidance_scale, that can be used with distilled LCM models to perform regular classifier-free guidance.

An example

Code to test the change in Text2Image pipeline:

from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "High altitude snowy mountains"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=6, generator=generator, guidance_scale=8.0
).images[0]

generator = torch.manual_seed(0)
image_native_cfg = pipe(
    prompt=prompt, num_inference_steps=6, generator=generator, guidance_scale=8, native_guidance_scale=1.5
).images[0]

Resultant images:

Screenshot 2024-02-16 at 10 32 56
Screenshot 2024-02-16 at 10 33 06

Code to test the change in img2img pipeline:

import torch
from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
from diffusers.utils import make_image_grid, load_image

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True, unet=unet
)
pipeline.enable_model_cpu_offload()
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)

prompt = "A painting of an astronaut in a jungle, cold color palette."

# pass prompt and image to pipeline
generator = torch.manual_seed(33)
image = pipeline(prompt, image=init_image, strength=0.5, num_inference_steps=6, generator=generator).images[0]
generator = torch.manual_seed(33)
image_new = pipeline(prompt, image=init_image, strength=0.5, num_inference_steps=6, generator=generator, native_guidance_scale=2.5).images[0]
make_image_grid([init_image, image, image_new], rows=3, cols=1)

Resultant images:

image

@patrickvonplaten and @sayakpaul

… models

This parameter can be used with distilled LCM models to also perform regular classifier-free guidance as usual. This was not possible but can be useful in terms of the quality of the generations for some cases.
@sayakpaul
Copy link
Member

Cc: @patil-suraj

…for LCM models

This parameter can be used with distilled LCM models to also perform regular classifier-free guidance as usual. This was not possible but can be useful in terms of the quality of the generations for some cases.
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of introducing this parameter into SDXL - it is only applicable to LCM. is this something we can implement with callback_on_the_step_end?

@ivanprado
Copy link
Contributor Author

@yiyixuxu Unfortunately, it is not possible to implement it using callback_on_the_step_end. The reasons are:

  • The property do_classifier_free_guidance returns False when a distilled LCM model is used. But we need it to return True for this feature. We could use a new parameter to force this (e.g. force_classifier_free_guidance). It would not be LCM specific.
  • Even after introducing the new parameter, a solution using callback_on_the_step_end won't allow modifying the guidance_scale for the first step, because the callback is executed at the end of the step. Therefore, it is impossible to modify the first step guidance scale, and the first step is very important.

I understand why you don't like adding a new parameter that works just for LCMs, but I don't see good alternative solutions and the proposed one here is harmless. What do you think?

@yiyixuxu
Copy link
Collaborator

@ivanprado
we are happy to extend our callback functionalities to make this work!

@yiyixuxu
Copy link
Collaborator

cc @vladmandic @asomoza @DN6 here as well

should we introduce a callback_on_step_begin ?

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Feb 27, 2024

I understand why you don't like adding a new parameter that works just for LCMs, but I don't see good alternative solutions and the proposed one here is harmless. What do you think?

agree it is harmless, but if we follow such a principle to add parameters for any small use cases, we will quickly get overwhelmed - we introduced callback loops for this exact reason. note that we have some parameters such as guidance_rescale that was introduced before the callback parameter and we would not have to add it otherwise.

It will be much easier for users to tweak our pipelines too, without having to submit PRs

@vladmandic
Copy link
Contributor

callback_on_step_begin

no issues with that on my side - and having more callbacks just gives more flexibility without too much complexities for normal user as they don't have to be used.
i'd say bigger issue is on your side - callbacks should be as uniform as possible between different pipelines, so while introducing a new one is fine, its less-than-ideal if its present in just one pipeline and updating all pipelines is probably not something you're looking forward to.

@asomoza
Copy link
Member

asomoza commented Feb 27, 2024

Maybe I'm missing something but in this case wouldn't it be better to just remove the check self.unet.config.time_cond_proj_dim is None so people can choose if they want to use it? Isn't this the same for the turbo and lighting models? People know that they have to keep the CFG to 1.0 to get the speed but they still can choose over 1.0 if they want better quality or control.

I think this can be resolved with just documentation, in my code I don't have that check for the same reason.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Feb 27, 2024

Maybe I'm missing something but in this case wouldn't it be better to just remove the check self.unet.config.time_cond_proj_dim is None so people can choose if they want to use it?

it will change the expected behavior and backward breaking then

@asomoza
Copy link
Member

asomoza commented Feb 28, 2024

it will change the expected behavior and backward breaking then

In that case I don't see any alternative than adding the callback, I agree with @vladmandic that more callbacks add more flexibility but personally I don't see a real use case for callback_on_step_begin yet.

@a-r-r-o-w did an experiment with this in #7038 (comment) though.

@ivanprado
Copy link
Contributor Author

Something to keep in mind. Even if the callback_on_step_begin is introduced, two other changes are still required so that the goal feature in the PR is possible:

  • Add a new parameter force_classifier_free_guidance to overpass the check self.unet.config.time_cond_proj_dim is None
  • Fix the bug when cfg is applied for models with time_cond_proj_dim

@ivanprado
Copy link
Contributor Author

I'm working on changing the code. Particularly, I propose to add the following parameters:

            callback_on_step_end_also_before_start (`bool`, *optional*, defaults to False):
                If `True`, the `callback_on_step_end` function will also be called before the start of the inference.
                The callback will receive -1 as step to identify this particular case, in which some tensors
                might not be available.
            force_classifier_free_guidance (`bool`, *optional*, defaults to False):
                Forces the execution of classifier free guidance, even if the guidance scale is below 1 or the model
                is a LCM model.

Early feedback is welcome.

…at pipeline_stable_diffusion_xl.py and pipeline_stable_diffusion_xl_img2img.py

callback_on_step_end_also_at_init (`bool`, *optional*, defaults to False):
    If `True`, the `callback_on_step_end` function will also be called before the start of the inference.
    The callback will receive -1 as step to identify this particular case, in which some tensors
    might not be available.
force_classifier_free_guidance (`bool`, *optional*, defaults to False):
    Forces the execution of classifier free guidance, even if the guidance scale is below 1 or the model
    is a LCM model.
@ivanprado
Copy link
Contributor Author

@yiyixuxu this is ready for a re-review. I've removed the old parameters, and introduced the following ones that are also backward compatible:

  callback_on_step_end_also_at_init (`bool`, *optional*, defaults to False):
      If `True`, the `callback_on_step_end` function will also be called before the start of the inference.
      The callback will receive -1 as step to identify this particular case, in which some tensors
      might not be available.
  force_classifier_free_guidance (`bool`, *optional*, defaults to False):
      Forces the execution of classifier free guidance, even if the guidance scale is below 1 or the model
      is a LCM model.

The test cases has been modified accordingly.

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Mar 12, 2024

@ivanprado Great work with this! Just curious: can't the functionality of callback_on_step_end_also_at_init be done with callback_on_step_begin. You can do some pre-inference stuff by some conditional logic when i==0. I mention it because if we push for begin callback, we can integrate things like differential diffusion quite easily across all inpaint pipelines, which also requires some things to be setup before the inference loop starts. WDYT?

@ivanprado
Copy link
Contributor Author

@a-r-r-o-w note that you could obtain almost the same effect that you get with callback_on_step_begin by using a callback_on_step_end with callback_on_step_end_also_at_init but ignoring the last invocation of the callback. For example:

steps = 10

def callback_on_step_end(pipe, step_index, timestep, callback_kwargs):
  # Your implementation here
  ...

def callback_on_step_end(pipe, step_index, timestep, callback_kwargs):
  nonlocal steps
  timesteps = 
  step_index += 1
  if step_index != steps:
    callback_kwargs = callback_on_step_begin(pipe, step_index, timestep, callback_kwargs)
  return callback_kwargs

result = pipe(
    prompt=prompt, num_inference_steps=steps, 
    callback_on_step_end=callback_on_step_end,
    callback_on_step_end_also_at_init=True,
).images[0]

The only problem I see is with the timestep, which will be only right for the first step. The rest will have the timestep of the previous step.

But if we would have access to the timesteps array in the callback, this wouldn't be a problem. Something else you would miss?

@ivanprado ivanprado requested a review from yiyixuxu March 13, 2024 08:00
@a-r-r-o-w a-r-r-o-w mentioned this pull request Mar 20, 2024
@ivanprado
Copy link
Contributor Author

Hi @yiyixuxu I've already implemented the suggested changes. It would be nice if you can have a look.

@bghira
Copy link
Contributor

bghira commented Apr 25, 2024

good candidate for #7761

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants