-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inpainting produces results that are uneven with input image #5808
Comments
Thanks for the clean issue here! @yiyixuxu can you have a look? |
hi @vladmandic: I'm trying to compare with auto1111 but I'm seeing same issue - can you tell me if there is anything wrong with my setting? ![]() ![]() |
Played around with it a little bit more. I think the "mask blur" option helps with this issue. I will look into adding this in diffusers. it is still not perfect though, let me know if there is anything else that I missed, I'm pretty new to auto1111 so it will help a lot if you can point me to the correct settings |
I think mask blur is really good at "hiding" the issue with i paint, it would be a welcome addition to diffusers. Underlying problem still exists, but really unsure how else to address it. |
ok, I will add the mask blur! |
I'll dig into it more, you can focus on mask blur. If I find something else I'll update here. |
I have had the time to test this out further and it looks like it's indeed very similar between Diffusers and the original backend - but not the same. I also saw that the preview of the generation process hints at a difference: |
there is no such thing, all "magic" happens in preprocessing. |
If you're referring to "Inpaint area", I always use the "Whole picture" option. |
that's interesting - can you try in img2img advanced -> disable full quality - that basically forces usage of taesd for final decode as well. |
So it's not a VAE thing, thus must be diffusers postprocessing? |
I don't know for sure if the VAE is involved, but the diffusers are definitely not doing it, since I just found the culprit in the UI: |
good point, i'll add it. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
push |
@castortroy83 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hope this is solved now? |
Are the models same for your tests? If so, Ccing @patil-suraj @yiyixuxu here. Cc: @vladmandic as well. |
Alright. Could you maybe provide your Also
Could you expand a bit more on this? |
I tested it more and the difference was that the comfyui uses by default the "only inpaint mask option" so it only affects the area around the mask. With this code: image= pipe(
prompt,
image=base,
mask_image=mask_blurred,
guidance_scale=8,
strength=0.99,
num_inference_steps=20,
generator=generator,
padding_mask_crop=32,
).images[0] The results are the same as comfyui:
|
@23pennies does the comment above from @asomoza help? |
@asomoza Could you say which variable in that snippet is for the "only inpaint mask option"? Also, which model were you using? |
https://huggingface.co/docs/diffusers/using-diffusers/inpaint#padding-mask-crop and I tested it with the inpainting model which seems to "decolorize" the image more than the normal one. https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1 |
I'm using SD.Next, it doesn't look like it's implementing it.
And the results are still discolored:
Could you point me to where you found this? My understanding of how ComfyUI works doesn't align with it. |
looking at the code of that node: m = (1.0 - mask.round()).squeeze(1)
for i in range(3):
pixels[:,:,:,i] -= 0.5
pixels[:,:,:,i] *= m
pixels[:,:,:,i] += 0.5
concat_latent = vae.encode(pixels)
orig_latent = vae.encode(orig_pixels) I'm no expert in ComfyUI and I think that one of the weak points of comfyui is that there's almost no documentation, or at least I couldn't find anything related to that node. The normal node and what's documented here: https://comfyanonymous.github.io/ComfyUI_examples/inpaint/ is the default in diffusers and the result is the same too: I really don't know if the hard coding would work in sd.next, if it doesn't do anything on top of diffusers, the results should be exactly the same, if it doesn't, then IMO that's something to discuss in the sd.next repo. Also I found that the inpainting model discolors things instead of changing them, probably because it doesn't have enough information to do it, but If I do this prompt: "red shirt" it does the discoloration thing instead of painting it red, and if I use the normal SDXL model with ComfyUI I get this:
You can clearly see that it is not inpainting the whole image but blending the mask section in, also the
Also you'll need to take into account that the results are also a lottery, it all depends if you get a good seed for what you're asking it to do.
it is something that all of the solutions need to do, unless you inpaint the whole image you need to blend the new inpaint section with the old image, there are techniques that makes this better like InvokeAI that uses patchmatch but most people just do a second pass over the whole image. If you want better results I recommend you use the new |
Thanks for the reply. Sorry I respond so sporadically. I'm not seeing an equivalent to padding_mask_crop here. What this piece of code seems to do is it rounds each pixel in the mask up or down to white or black (so a blurred mask doesn't actually do much), then turns every pixel in the image that is covered by the mask to grey. I hacked together a node that returns the "pixels" value to confirm it: diffusers/src/diffusers/image_processor.py Line 504 in b4226bd
then later stitched into the original image: diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py Line 1789 in b4226bd
(mimicking the functionality that's already in automatic and SD.Next, that's why SD.Next doesn't use it) In ComfyUI, this wouldn't be possible; the KSampler node has no access to a vae that could turn latents to images for cropping, and vae decode has no stitching-together functionality. Further evidence for this - I manually added gaussian noise to the masked area (left picture). The sampler in ComfyUI (middle picture)correctly sees the entire image as a context and paints a guy's chin, a blue shirt and a strange chain (from the prompt "necklace"). Diffusers with padding_mask_crop=32 has no clue what is going on because the context was cropped out (right picture): ![]()
Can't say I'm a fan of its documentation. The VAE Encode for Inpaint node is outdated and the InpaintModelConditioning is what should be used.
I just mentioned it to rule out the possibility of something else messing up, so it's just those parameters.
Yeah, I noticed that even in ComfyUI the discoloration starts happening to the whole image if the mask is big enough, but even so, it's not as harsh as Diffusers. And inpaint Models aren't perfect, and with some specific cases like the one here (red shirt) they fail. In this case, "green shirt" works much better. But, this seems to be on the model anyway, I get the same results with Diffusers.
Yeah, I double and triple check everything with a dozen different seeds.
If ComfyUI blends the results with the original image, it must do so in latent space. Maybe that's the key difference? Could we perhaps try that in Diffusers?
Differential diffusion is amazing, but it has its limits. In my testing, it couldn't handle denoising above 0.8, so inpainting models are not dead, yet. |
No problem, I also want to find the differences between diffusers and comfyui, so I'm just glad that someone wants to also put some effort into this instead of just wanting answers.
yeah, I didn't go any further with this since I believed that the
So far I found while discussing this with you that comfyui does two things different than diffusers:
But just with this we can almost be sure that the whole image is being passed to the inpaint probably to grab de context and then only the masked part is blended back in. I don't think this functionality is in diffusers though, we only have the options to pass the whole image and return the whole inpainted image or to just pass the masked section of it and get the original image with the blended masked section, maybe @yiyixuxu can corroborate this. So what we're missing is sending the whole image for inpainting and get back the masked part blended with the original latents, maybe can use a nice way of doing this like this code from automatic1111: I'll try to do a PoC of this when I have the time and see how it goes.
I did a test with 0.9 and it doesn't look that bad though (cherry-picked): I did the same with comfyui and it looks worse IMO (cherry-picked): |
I had to update my understanding of differential diffusion, so here's another attempt. Less structured this time because it's not actually directly comparable to inpainting models. Also, the differential diffusion script in SD.Next was broken so I did this with ComfyUI, but I haven't seen a qualitative difference between the two. Differential diffusion did well in the above case with a fairly simple image, but breaks down with more complex scenes. Inpainting models do very well taking the entire image into consideration. I have this base image: I want to replace the soldier with a cowboy. The inpainting model does great at 1.0 denoising. The planet in the background for example is restored almost perfectly where it was previously occluded by the soldier: If I leave the gun barrels unmasked, the inpainting model adapts the entire masked area to accomodate this new detail, despite being the same seed: Differential Diffusion (at 0.9 denoising) breaks completely with a hard mask. It's actually almost identical to inpainting without differential: Differential diffusion only starts working with a soft mask. The inpainted content blends in much better, and you might prefer these results because they're more interesting, but it still has problems such as the barrels becoming floating black sticks, the horse fading out at the edges, and the planet is generally not restored as nicely: The problems are more severe at 1.0 denoising: And better at 0.8, but the image sticks closer to the original: So my takeaway for now is: Differential diffusion requires blurred masks, which are harder to control, and only works well at denoising levels that can't replace entire objects. I can see use cases for it, but so far it's not replacing inpainting models. |
differential diffusion is not a replacement for standard inpainting nor its supposed to be. and yes, it works with grayscale masks as its designed to, it doesn't work well with hard masks. for your example above, try combining differential diffusion with a depth model to create a mask - it will follow original input precisely while replacing your soldier with cowboy. and if you want to fine-tune, then you take that mask and do further edits. |
IMO we need to draw a line here, I was just concentrating on replicating the same functionality of comfyui and not trying to achieve the SOTA of inpainting, I did post about differential as a suggestion because IMO is a lot better than normal inpainting and I disagree, in my tests and use cases it completely replaces normal inpainting but each of us has our own opinion about this and is all good. Just to be fair with Using what I call a big bad mask done in seconds with GIMP and a prompt "cowboy on a horse":
with a better mask done also in seconds and a prompt "cowboy":
This result can be improved if you use a depth map and more precise mask over it to keep more of the details of the original image, also with some inpainting and a second pass over it. But good results also can be achieved with the normal inpainting and similar techniques, if you want a full blown inpainting with the best quality of this I can do it but IMO is not the issue we're discussing here. Going back the original issue, what we're trying to achieve is to come closer to a "one step inpainting with a binary mask" similar to comfyui. So far I got the gray masking (which doesn't improve the result) and also the insert back the inpainted part back in the original, but I can now clearly see that the new inpainting always comes discolored, here's an example if I blend them in the latent space:
This is an example if I do it after the vae decode:
It doesn't matter how many times I try, it always return a "washed out image" so I'm also starting to think there's something wrong with diffusers, I did a quick debug in comfyui and I couldn't find any code that does something different, it just returns the image with the correct colors and saturation. This is something I can fix with just matching the histogram of the inpainted part to the original image: but I don't think this is the correct method to fix this issue, the inpainting shouldn't return a washed out image, also this only happens with the inpaint model and not the normal one or with the edit: I just found out that this is a 5 month or older issue so probably it won't get fixed. Also is under the limitations:
But I found out that even at 0.7 it still returns a discolored image so I guess my solution is as good as any, if anyone wants it I can clean it and post it. |
Hello, If I recall correctly, inpaint checkpoints accept additional channels of the inpaint mask and a covered picture. This aligns perfctly with the algorithm - the key idea that on every step, those components are upadted, according to the current threshold mask. I believe that integrating diff-diff with inpaint checkpoints might address the issues raised by @23pennies @asomoza. |
Hi @exx8 , I will look into this then, I was happy with the results using normal models, if you're saying that with inpainting models could get even better, is worth investigating it. |
It really depends on the context. Some edits may be better with the general checkpoint, while others might be better with the inpaint ones. It's worth noting that the strength values might have different impacts between different models. It's expected that inpaint models will be more modest with their changes for the same strength value. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi, |
Also, is there any conclusion with current inpainting that we can circumvent for preventing the discoloration from happening? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Describe the bug
SD inpainting works fine only if mask is absolutely perfect.
Otherwise, there are always visible seams at the edge of the mask, and uneven colors between inpainted and input image.
I've tried manually assembling latents and using them for
image
andmask_image
instead of images as well as manually assembling entiremasked_image_latent
- results are the same so i left reproduction as simple as possible.Same behavior is visible in SD and SD-XL pipelines, using base model as well as dedicated inpainting models.
Non-diffuser inpainting implementations such as legacy A1111 implementation does not have this issue.
i've attached a very simple reproduction code that
Reproduction
Logs
No response
System Info
diffusers==0.23.0
Who can help?
@patrickvonplaten @yiyixuxu @DN6 @sayakpaul
Examples
note: issue was originally reported at vladmandic/sdnext#2501 which you can check for additional examples.
The text was updated successfully, but these errors were encountered: