Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BART] Cannot copy out of meta tensor; no data! #36247

Open
1 of 4 tasks
jiqing-feng opened this issue Feb 18, 2025 · 2 comments
Open
1 of 4 tasks

[BART] Cannot copy out of meta tensor; no data! #36247

jiqing-feng opened this issue Feb 18, 2025 · 2 comments
Labels

Comments

@jiqing-feng
Copy link
Contributor

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.49.0
- Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.28.1
- Safetensors version: 0.5.2
- Accelerate version: 1.4.0
- Accelerate config:    - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 2
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: 5,6
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - enable_cpu_affinity: False
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0a0+ecf3bae40a.nv25.01 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
- Using GPU in script?: <fill in>
- GPU type: NVIDIA A100 80GB PCIe

Who can help?

@SunMarc @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import BartForConditionalGeneration
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn", device_map="cuda:0")

Traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 4397, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "/usr/local/lib/python3.12/dist-packages/accelerate/big_modeling.py", line 496, in dispatch_model
    model.to(device)
  File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 3162, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1344, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 904, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 904, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 931, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1337, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Expected behavior

This is weird because other bart models like distilbart-cnn-12-6 works well.

@Rocketknight1
Copy link
Member

Rocketknight1 commented Feb 18, 2025

I can reproduce this error, and it's quite strange! cc @muellerzr @SunMarc since the error is triggered inside accelerate.

Loading the model on CPU and calling model.to("cuda") works fine for me.

@SunMarc
Copy link
Member

SunMarc commented Feb 19, 2025

I'll have a look. This is probably an issue with weights that are not loaded correctly due to tied weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants