Transformers does not work well with `with torch.device("meta"):` #36309

fxmarty-amd · 2025-02-20T18:02:34Z

System Info

- `transformers` version: 4.49.0
- Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.28.1
- Safetensors version: 0.4.4
- Accelerate version: 1.2.1
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cpu (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@ArthurZucker @SunMarc

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Hi, when loading a model on meta device many warnings "copying from a non-meta parameter in the checkpoint to a meta parameter in the current model" are printed.

Reproduction:

from transformers import AutoModelForCausalLM
import torch

with torch.device("meta"):
    original_model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")

gives a lot of warnings:

/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.embed_tokens.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.embed_positions.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.final_layer_norm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.final_layer_norm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.k_proj.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
...

I don't recall it being the case previously, maybe something changed in from_pretrained or torch, that results in these warnings?

Expected behavior

No warnings, params on meta device

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-02-21T15:22:16Z

cc @SunMarc @muellerzr since this crosses over with accelerate a bit, but also @fxmarty-amd the errors don't surprise me, since from_pretrained loads weight data into the model, and meta tensors don't actually hold weight data! Maybe you could avoid the errors by initializing the same model architecture without weight loading like this:

import torch
from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained("facebook/opt-125m")
with torch.device("meta"):
    model = AutoModelForCausalLM.from_config(config)

SunMarc · 2025-02-21T15:31:16Z

I don't recall it being the case previously, maybe something changed in from_pretrained or torch, that results in these warnings?

This is due to the torch if I recall correctly. I can check what can be done on our side but since you don't need the weights, I think @Rocketknight1 solution will be more appropriate.

fxmarty-amd · 2025-02-21T16:37:43Z

Thank you this makes sense, I'll try to use from_config whenever possible (although external libraries relying on from_pretrained may not expose from_config).

fxmarty-amd added the bug label Feb 20, 2025

fxmarty-amd closed this as completed Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformers does not work well with `with torch.device("meta"):` #36309

Transformers does not work well with `with torch.device("meta"):` #36309

fxmarty-amd commented Feb 20, 2025

Rocketknight1 commented Feb 21, 2025

SunMarc commented Feb 21, 2025

fxmarty-amd commented Feb 21, 2025

Transformers does not work well with with torch.device("meta"): #36309

Transformers does not work well with with torch.device("meta"): #36309

Comments

fxmarty-amd commented Feb 20, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Feb 21, 2025

SunMarc commented Feb 21, 2025

fxmarty-amd commented Feb 21, 2025

Transformers does not work well with `with torch.device("meta"):` #36309

Transformers does not work well with `with torch.device("meta"):` #36309