Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differential Diffusion: Giving Each Pixel Its Strength #7038

Closed
1 task done
exx8 opened this issue Feb 20, 2024 · 21 comments · Fixed by #7550
Closed
1 task done

Differential Diffusion: Giving Each Pixel Its Strength #7038

exx8 opened this issue Feb 20, 2024 · 21 comments · Fixed by #7550

Comments

@exx8
Copy link

exx8 commented Feb 20, 2024

Model/Pipeline/Scheduler description

Hello,
I would like to suggest merging my paper: Differential Diffusion: Giving Each Pixel Its Strength.
The paper allows a user to edit a picture by a change map that describes how much each region should change.
The editing process is typically guided by textual instructions, although it can also be applied without guidance.
We support both continuous and discrete editing.
Our framework is training and fine tuning free! And has negligible penalty of the inference time.
Our implementation is diffusers-based.
We already tested it on 4 different diffusion models (Kadinsky, DeepFloyd IF, SD, SD XL).
We are confident that the framework can also be ported to other diffusion models, such as SD Turbo, Stable Cascade, and amused.
I notice that you usually stick to white==change convention, which is opposite to the convention we used in the paper.
The paper can be thought of as a generalization to some of the existing techniques.
A black map is just regular txt2img ("0"),
A map of one color (which isn't black) can be thought as img2img,
A map of two colors which one color is white can be thought as inpaint.
And the rest? It's completely new!
In the paper, we suggest some further applications such as soft inpainting and strength visualization.

Open source status

  • The model implementation is available.

Provide useful links for the implementation

Site:
https://differential-diffusion.github.io/
Paper:
https://differential-diffusion.github.io/paper.pdf
Repo:
https://github.com/exx8/differential-diffusion

@asomoza
Copy link
Member

asomoza commented Feb 20, 2024

I was waiting for the code, thank you for your work, I will try this with a diffusers only solution and see how good are the results, they really look promising.

@yiyixuxu
Copy link
Collaborator

thanks for the message
we're keeping an eye on this!

@vladmandic
Copy link
Contributor

if you want to test it out, i've integrated it with diffuser pipeline in sdnext and really liking the results so far.

@asomoza
Copy link
Member

asomoza commented Feb 22, 2024

I've been testing this and is really impressive. These are some tests I did, I'll compare it to the default diffusers inpainting (with mask blur) and with fooocus since people say is the SOTA at the moment with SDXL.

Input Image diffusers mask differential mask
bird bird_mask_2 bird_mask_inverted_3

prompt: a crow on top of a branch

With diffusers, I had to lower the strength to 0.8:

Diffusers

inpainting_20240222183846_3959726112 inpainting_20240222183913_4270722108 inpainting_20240222183941_3478101169

Fooocus

2024-02-22_18-46-31_1076 2024-02-22_18-47-00_6793 2024-02-22_18-47-14_4384

Differential Diffusion

differential_20240222181938_2758543477 differential_20240222181955_3006919039 differential_20240222184043_3009604638

I picked the best results for all three of them, with differential I did a 1.0 strength and you can see it since the crow is really black like in fooocus. The middle result from differential would be really hard (at least for my eyes) to notice that's an inpainting, except for the missing talons.

Also the demo is really cool, with a depth map you can gradually transform a scene like this:

input default result with marigold
input2 differential_20240222173716 differential_20240222173838

My vote is that this should be added as a core option for the inpainting pipeline, this results are really good, at least for me and would make diffusers be as good as other solutions.

@vladmandic
Copy link
Contributor

one more quick note - i like two tunable params to play with here - denoising strength as usual, but also applying relative mask brightness (-1..1), it gives so much freedom on how strong is mask being applied.

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Feb 23, 2024

I've been playing around with this as well and the results are very impressive. Very cool comparison @asomoza! I'd love for this to be a feature available across inpaint pipelines.

Btw, the changes for this can be fully implemented with callback_on_step_begin (I've been looking for good use cases of #6256 and this looked like one, so I gave it a shot at here).

@vladmandic
Copy link
Contributor

vladmandic commented Feb 23, 2024

I've been playing around with this as well and the results are very impressive. Very cool comparison @asomoza! I'd love for this to be a feature available across inpaint pipelines.

Btw, the changes for this can be fully implemented with callback_on_step_begin (I've been looking for good use cases of #6256 and this looked like one, so I gave it a shot at here).

totally off-topic here, but i've been doing color grading on latents (tint shifts, dynamic range correction, re-sharpen, etc.) in that callback. works really nice if you bind operations to specific timestep ranges.

@asomoza
Copy link
Member

asomoza commented Feb 23, 2024

one more quick note - i like two tunable params to play with here - denoising strength as usual, but also applying relative mask brightness (-1..1), it gives so much freedom on how strong is mask being applied.

With your comment I did a quick test, this was made with just changing the brightness of the depthmap to see the effect:

test2

prompt = "anime still"
negative_prompt = "realistic, photo"

Had to crop it because it seems that "anime" also means "open shirt" to the Juggernaut model

original mask
20231226020109 depth_map_inverted_20240222215803

Also this is what you can do with a gradient instead of a depth map:

differential_20240222221545_940873918 differential_20240222222008_1351423115

@asomoza
Copy link
Member

asomoza commented Feb 23, 2024

Btw, the changes for this can be fully implemented with callback_on_step_begin (I've been looking for good use cases of #6256 and this looked like one, so I gave it a shot at here).

yeah, if the diffusers team doesn't add it to the core functionality they can still mention it as a alternative.

totally off-topic here, but i've been doing color grading on latents (tint shifts, dynamic range correction, re-sharpen, etc.) in that callback. works really nice if you bind operations to specific timestep ranges.

I Intend to do that too but there is some much going on that I don't have the time to do it all, are you going to add it to SD.Next, I would really like to test it with an UI instead of code.

@vladmandic
Copy link
Contributor

vladmandic commented Feb 23, 2024

It's already added to SDnext. Its in "corrections" accordians and you can play with xyz grid as well.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 21, 2024
@Scorpinaus
Copy link

Not stale?

@github-actions github-actions bot removed the stale Issues that haven't received updates label Mar 22, 2024
@exx8
Copy link
Author

exx8 commented Mar 22, 2024

Hi,
Do you need some help to adapt it to diffusers?

@a-r-r-o-w
Copy link
Member

Hi,
Do you need some help to adapt it to diffusers?

Thanks for the awesome work. I will open a PR adding support shortly from my linked branch above. I've been looking at various discussions about how best to support callbacks in order to make future additions of similar kind more easier in various issues and PRs, and perhaps that's causing a delay.

@zacheryvaughn
Copy link

zacheryvaughn commented Mar 21, 2025

As you know, Differential Diffusion is not exactly "Soft InPainting" and I don't think anyone expects it to be. But I have found that automatically sequencing the InPainting Pipeline and Differential Diffusion Pipeline produces very nice results.

InPainting Pipelines runs for just 10 steps, to generate a sort of rough draft of the desired content. Then using the same mask but with blur applied, Differential Diffusion does a fantastic job refining the desired content and blending it with the original image.

This process ALSO works fantastically for OutPainting. Traditionally an extension blur is used to provide color/context for Differential Diffusion in OutPainting, but replacing this blur with a brief InPainting operation produces significantly better results. The InPainting adds new content to the extended area (with the full image context), then Differential Diffusion refines and blends it.

Here is an InPainting example.

Image

@asomoza
Copy link
Member

asomoza commented Mar 21, 2025

Hi @zacheryvaughn, would you be interested in doing a community pipeline with this? seems a really interesting approach.

Also if you want and have the time, can you test the same image with this space, it uses a custom pipeline with just 8 steps and your example seems hard to inpaint so this would be a good benchmark for both techniques.

@zacheryvaughn
Copy link

zacheryvaughn commented Mar 31, 2025

Cascading InPaint and DiffDiff

I used 30 steps for each test. For the InPaint + DiffDiff test, I ran 10 InPaint steps and 20 DiffDiff steps, to be fair. The input mask is split into two versions. InPaint phase binarizes the mask at 0.8 (instead of 0.5) to ensure that the DiffDiff phase overlaps the inpainted area more and DiffDiff uses an inverted mask. As for the custom pipeline, I'm working on it but I'm not super advanced so it's also a learning process for me. lol The partitioned timesteps are difficult to deal with because of how precomputed masks and noisy latents are synced with timesteps in DiffDiff.

Ran 10 tests for each method and chose the best output. InPaint actually did pretty poorly but got lucky with one really good result. The final 3 methods obviously performed the best, only one of them doesn't requiring extra manual work (InPaint + DiffDiff).

InPaint DiffDiff InPaint + DiffDiff Draw + DiffDiff Paste + DiffDiff
Does a good job adding new content to empty spaces but is poorly blended with the image.
Does a great job modifying existing content, but really struggles with adding new content to empty spaces.
InPaint gives the DiffDiff phase something it can work with for a decent result. 1/3 of the steps were used for inpainting.
Works very well with DiffDiff but requires manually painting first. Can depend on how good you are at painting.
Works amazing with DiffDiff but requires pasting an image first. Funny how the dragon got reversed. DiffDiff really just needs ANY content it can transform.

@asomoza
Copy link
Member

asomoza commented Mar 31, 2025

nice! I've been following your process, IMO the one that looks best is the Draw + DiffDiff but yeah, that actually requires you to manually draw a shape (which shouldn't be that hard though) but the InPaint + DiffDiff it's also good.

Thanks for doing all this, really good work, and take all the time you need with this, I don't have the bandwidth right now to help, but if you get stuck somewhere, feel free to open a PR and I'll try to look at it or maybe someone else from the community also can help.

@zacheryvaughn
Copy link

zacheryvaughn commented Apr 1, 2025

Sorry for leaving a lot of comments in here, hope it's not annoying to anyone.

I have been testing a new method today, because I realize that DiffDiff just needs anything at all to work with... The difficulty comes when you are starting from nothing, an empty sky or any empty space.

For this method, I automatically generate some noise and paste it onto the input image itself. No extra work required, no extra inference steps required... Literally just noise added directly onto the image where the mask value is over 0.8.

And surprisingly, but maybe NOT surprisingly, the results are just as good as using the painted or pasted image!🤣 I ran it over and over again at least 50 times, I did not even get ONE bad result... Every single image was clearly a dragon.

So... InPaint + DiffDiff is essentially a useless method, I'm abandoning the idea. Filling with random noise is definitely the way to go. I underestimated the extent to which it truly does not matter what you put in the masked area, it could be anything.

Image Image Image
Image Image Image

@asomoza
Copy link
Member

asomoza commented Apr 1, 2025

I suggest to also test this with other kind of inpaintings though, what you're doing looks good but to my eyes, it's an easy inpaint, so you should also test with a harder one and also test it with photorealistic or real ones.

Also I can suggest you to try filling the space with opencv TELEA or NAVIER STOKES which are sometimes good to fill areas before inpainting and see if it helps for harder inpaintings or outpaintings, I'm suggesting this instead of something like lama cleaner because it's a lot faster and as you can see, the pre fill doesn't need to be that good.

@zacheryvaughn
Copy link

zacheryvaughn commented Apr 1, 2025

Those images are easy for inpainting but they are difficult for diffdiff, that's why I used them as examples. But you're right, if I'm trying to inpaint the image, I should be comparing to images that are difficult for inpainting.

The random noise that is automatically filled into the >0.8 area of the mask, is not meant to fill empty space or create a blurred/average area. img2img will see this as the object that you want to create. It's like, "Hey, look at this blurry colorful blob in my masked area, turn it into a girl/dragon/etc." Otherwise what you're doing is saying, "Hey look at this forest, put a girl within the masked area," but img2img just isn't strong enough to do that... It needs to see something there. So filling it using TELEA or NS would somewhat defeat the purpose in this case.

Here are realism images that are difficult for inpainting.

Standard InPainting: 2/30 images were good. (strength of 0.9 worked best)
Standard DiffDiff: 9/30 images were good. (must use strength of >0.9)
Noised Image + DiffDiff: 24/30 images were good. (strength of 0.75 worked best)

Image Image Image
Auto Noise + DiffDiff DiffDiff InPaint
Image Image Image
Image Image Image
Image Image Image

The Limit! Very Tedious Prompting

Image Image Image Image

Edit: I found that a vibrant lowres noise pattern like this actually works much better and is even more fast and simple.

import numpy as np
from PIL import Image

def generate_chunky_noise(shape, block_size=16):
    width, height = shape
    low_res_w, low_res_h = width // block_size, height // block_size
    noise_np = np.random.randint(0, 256, (low_res_h, low_res_w, 3), dtype=np.uint8)
    noise_img = Image.fromarray(noise_np).resize((width, height), Image.NEAREST)
    return noise_img

def apply_noise_fill(image_path, mask_path, output_path):
    image = Image.open(image_path).convert("RGB")
    mask = Image.open(mask_path).convert("L").resize(image.size, Image.NEAREST)
    noise_img = generate_chunky_noise(image.size, block_size=16)
    image_np = np.array(image)
    noise_np = np.array(noise_img)
    mask_np = np.array(mask, dtype=np.float32) / 255.0
    threshold = 0.75
    mask_indices = mask_np >= threshold
    image_np[mask_indices] = noise_np[mask_indices]
    result = Image.fromarray(image_np)
    result.save(output_path)
    print(f"Saved output to {output_path}")

apply_noise_fill("original_image.png", "mask_image.png", "prepared_image.png")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants