Comprehensive type checking for `from_pretrained` kwargs #10758

guiyrt · 2025-02-10T14:24:20Z

What does this PR do?

Changes

Moved type checking to just before pipeline instantiation, so all kwargs are checked
Full type-check for collections (list, dicts, ...), every element checked
More detailed warning for unexpected arguments type (List[ControlNetModel] instead of list)

To-do

Where should the new functions is_valid_type and get_detailed_type be placed?
~~According to new functions location, add simple tests for type checking.~~

These changes are proposed based on testing for #10747.

Example warning when providing controlnetas List[ControlNetUnionModel] for StableDiffusionXLControlNetPipeline, where List[ControlNetModel] is expected:

Expected types for controlnet: (<class 'diffusers.models.controlnets.controlnet.ControlNetModel'>,
typing.List[diffusers.models.controlnets.controlnet.ControlNetModel], 
typing.Tuple[diffusers.models.controlnets.controlnet.ControlNetModel],
<class 'diffusers.models.controlnets.multicontrolnet.MultiControlNetModel'>),
got typing.List[diffusers.models.controlnets.controlnet_union.ControlNetUnionModel].

Code for warning replication

import torch

from diffusers import StableDiffusionXLControlNetPipeline
from diffusers.models import ControlNetUnionModel, AutoencoderKL
from diffusers.utils import load_image


controlnet = ControlNetUnionModel.from_pretrained(
    "brad-twinkl/controlnet-union-sdxl-1.0-promax", torch_dtype=torch.float16
)

vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=[controlnet, controlnet],
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
)

room_seg_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/room_seg.png")
pose_img = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/pose.png")


pipe.enable_model_cpu_offload()

image = pipe(
    prompt="an astronaut in a space station",
    width=1024,
    height=1024,
    negative_prompt="lowres, low quality, worst quality",
    generator=torch.manual_seed(42),
    guidance_scale=5,
    num_inference_steps=50,
    image=[pose_img, room_seg_img],
).images[0]

image.save("result.jpg")

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@hlky

HuggingFaceDocBuilderDev · 2025-02-10T15:04:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

Thanks @guiyrt, nice work. Could you take a look through pipeline test output searching for Expected types for (GitHub's built in search works best)

There are some easy cases that we could fix like

Expected types for feature_extractor: (<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>,), got <class 'transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor'>.

For that we could do global find+replace on feature_extractor: CLIPImageProcessor -> feature_extractor: CLIPFeatureExtractor.

and some that need investigating

Expected types for unet: (<class 'inspect._empty'>,), got <class 'diffusers.models.unets.unet_2d.UNet2DModel'>.

Type correctness is not strictly enforced so some warnings are expected but we should make a best effort to reduce the number of new warnings that we're introducing. If we find a particular component to be a problem we can skip it like scheduler.

hlky · 2025-02-10T18:39:45Z

Failing tests appear unrelated, will re-run later.

guiyrt · 2025-02-12T11:14:03Z

@hlky Findings from looking through the test logs

TL;DR
tokenizer is the one with most warnings, for example, when T5Tokenizer is annotated but T5TokenizerFast is used. Most of the warnings are smaller things and most are corrected/addressed in 5ca27aa. Doing find+replace for Union[BaseTokenizer, FastTokenizer] deals with this problem, but will change many files, is this ok?

1. Using XYZFast `tokenizer` when only XYZ is annotated (and vice-versa)

We can make a quick search and replace and update all tokenizer annotations to be Union[XYZBase, XYZFast], but as this is a big change, let me know if you agree.

18 occurrences

Expected types for tokenizer: (<class 'transformers.models.xlm_roberta.tokenization_xlm_roberta.XLMRobertaTokenizer'>,), 
got <class 'transformers.models.xlm_roberta.tokenization_xlm_roberta_fast.XLMRobertaTokenizerFast'>.

68 occurrences

Expected types for tokenizer: (<class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>,),
got <class 'transformers.models.t5.tokenization_t5_fast.T5TokenizerFast'>.

9 occurrences

Expected types for tokenizer: (<class 'transformers.models.bert.tokenization_bert.BertTokenizer'>,),
got <class 'transformers.models.bert.tokenization_bert_fast.BertTokenizerFast'>.

4 occurences

Expected types for tokenizer: (<class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>,), got <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>.

2. `CLIPFeatureExtractor` as `feature_extractor`

This comes from tests that use a hf-internal-testing repo with legacy CLIPFeatureExtractor instead of CLIPImageProcessor. A warning from transformers is also thrown FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.

15 occurrences

Expected types for feature_extractor: (<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>,),
got <class 'transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor'>.

diffusers/tests/pipelines/test_pipelines.py

Lines 582 to 583 in 8ae8008

    
           def test_download_legacy_variants_with_sharded_ckpts_raises_warning(self): 
        
               repo_id = "hf-internal-testing/tiny-stable-diffusion-pipe-variants-all-kinds"

3. `PipelineFastTests::test_optional_components`

This test purposefully sets requires_safety_checker as [True, True] and safety_checker as an unet and feature_extractor as a function, so the warnings here are expected.

2+2+2 occurrences

Expected types for safety_checker: (<class 'diffusers.pipelines.stable_diffusion.safety_checker.StableDiffusionSafetyChecker'>,),
got <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>.
Expected types for requires_safety_checker: (<class 'bool'>,), got typing.List[bool].
Expected types for feature_extractor: (<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'>,),
got <class 'function'>.

diffusers/tests/pipelines/test_pipelines.py

Lines 1695 to 1701 in 8ae8008

    
           # Test that partially loading works 
        
           sd = StableDiffusionPipeline.from_pretrained( 
        
               tmpdirname, 
        
               feature_extractor=self.dummy_extractor, 
        
               safety_checker=unet, 
        
               requires_safety_checker=[True, True], 
        
           )

4. Missing type hinting

Added the intended types.

6+6 occurrences from HunyuanDiTPipelines.

Expected types for text_encoder_2: (<class 'inspect._empty'>,),
got <class 'transformers.models.t5.modeling_t5.T5EncoderModel'>.
Expected types for tokenizer_2: (<class 'inspect._empty'>,),
got <class 'transformers.models.t5.tokenization_t5_fast.T5TokenizerFast'>.

diffusers/src/diffusers/pipelines/hunyuandit/pipeline_hunyuandit.py

Lines 210 to 211 in 8ae8008

    
           text_encoder_2=T5EncoderModel, 
        
           tokenizer_2=MT5Tokenizer,

diffusers/src/diffusers/pipelines/controlnet_hunyuandit/pipeline_hunyuandit_controlnet.py

Lines 235 to 236 in 57ac673

    
           text_encoder_2=T5EncoderModel, 
        
           tokenizer_2=MT5Tokenizer,

7+7 occurrences from CustomPipeline tests. Only showed for unet because scheduler is not checked.

Expected types for unet: (<class 'inspect._empty'>,),
got <class 'diffusers.models.unets.unet_2d.UNet2DModel'>.
Expected types for unet: (<class 'inspect._empty'>,),
got <class 'diffusers.models.unets.unet_1d.UNet1DModel'>.

diffusers/tests/fixtures/custom_pipeline/pipeline.py

Line 36 in 8ae8008