Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformers does not work well with with torch.device("meta"): #36309

Closed
2 of 4 tasks
fxmarty-amd opened this issue Feb 20, 2025 · 3 comments
Closed
2 of 4 tasks

Transformers does not work well with with torch.device("meta"): #36309

fxmarty-amd opened this issue Feb 20, 2025 · 3 comments
Labels

Comments

@fxmarty-amd
Copy link

System Info

- `transformers` version: 4.49.0
- Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.28.1
- Safetensors version: 0.4.4
- Accelerate version: 1.2.1
- Accelerate config:    not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cpu (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@ArthurZucker @SunMarc

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Hi, when loading a model on meta device many warnings "copying from a non-meta parameter in the checkpoint to a meta parameter in the current model" are printed.

Reproduction:

from transformers import AutoModelForCausalLM
import torch

with torch.device("meta"):
    original_model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")

gives a lot of warnings:

/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.embed_tokens.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.embed_positions.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.final_layer_norm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.final_layer_norm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.k_proj.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
/home/fxmarty/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py:2397: UserWarning: for model.decoder.layers.0.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(
...

I don't recall it being the case previously, maybe something changed in from_pretrained or torch, that results in these warnings?

Expected behavior

No warnings, params on meta device

@Rocketknight1
Copy link
Member

cc @SunMarc @muellerzr since this crosses over with accelerate a bit, but also @fxmarty-amd the errors don't surprise me, since from_pretrained loads weight data into the model, and meta tensors don't actually hold weight data! Maybe you could avoid the errors by initializing the same model architecture without weight loading like this:

import torch
from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained("facebook/opt-125m")
with torch.device("meta"):
    model = AutoModelForCausalLM.from_config(config)

@SunMarc
Copy link
Member

SunMarc commented Feb 21, 2025

I don't recall it being the case previously, maybe something changed in from_pretrained or torch, that results in these warnings?

This is due to the torch if I recall correctly. I can check what can be done on our side but since you don't need the weights, I think @Rocketknight1 solution will be more appropriate.

@fxmarty-amd
Copy link
Author

Thank you this makes sense, I'll try to use from_config whenever possible (although external libraries relying on from_pretrained may not expose from_config).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants