Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading #36272

Pritidhrita · 2025-02-19T07:33:39Z

System Info

I'm running into a persistent issue when trying to load the LLaMA 3.1 8B model with 4-bit quantization. No matter what configuration I try, I get this error during initialization:

pgsql
Copy
CopyValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Environment:

Python: 3.10
Transformers: Latest version
PyTorch: Latest version
GPU: 85.05 GB memory available
CUDA: Properly installed and available
What I've tried:

Loading with a BitsAndBytesConfig:
python
Copy
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
llm_int8_has_fp16_weight=True
)

base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=bnb_config,
trust_remote_code=True,
use_cache=True,
device_map='auto',
max_memory={0: "24GiB"}
)
Loading without device mapping:
python
Copy
model_kwargs = {
"trust_remote_code": True,
"load_in_4bit": True,
"torch_dtype": torch.float16,
"use_cache": True
}

Expected behavior

Clearing CUDA cache and running garbage collection beforehand.
Experimenting with different device mapping strategies.
Even with an ample GPU memory (85.05 GB) and confirmed CUDA availability, I still can't seem to get the model to load without running into this device movement error. Other models load fine when using quantization, so I'm not sure what's special about this setup.

Any ideas on how to resolve this or work around the error? Thanks in advance for your help!

Checklist

I have read the migration guide in the readme. (pytorch-transformers; pytorch-pretrained-bert)
I checked if a related official extension example runs on my machine.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-02-19T15:00:04Z

cc @SunMarc @muellerzr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading #36272

Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading #36272

Pritidhrita commented Feb 19, 2025 •

edited

Loading

Rocketknight1 commented Feb 19, 2025

Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading #36272

Device Movement Error with 4-bit Quantized LLaMA 3.1 Model Loading #36272

Comments

Pritidhrita commented Feb 19, 2025 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Rocketknight1 commented Feb 19, 2025

Pritidhrita commented Feb 19, 2025 •

edited

Loading