You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running into a persistent issue when trying to load the LLaMA 3.1 8B model with 4-bit quantization. No matter what configuration I try, I get this error during initialization:pgsqlCopyCopyValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Environment:
Python: 3.10
Transformers: Latest version
PyTorch: Latest version
GPU: 85.05 GB memory available
CUDA: Properly installed and available
What I've tried:
Loading with a BitsAndBytesConfig:
python
Copy
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
llm_int8_has_fp16_weight=True
)
Clearing CUDA cache and running garbage collection beforehand.
Experimenting with different device mapping strategies.
Even with an ample GPU memory (85.05 GB) and confirmed CUDA availability, I still can't seem to get the model to load without running into this device movement error. Other models load fine when using quantization, so I'm not sure what's special about this setup.Any ideas on how to resolve this or work around the error? Thanks in advance for your help!
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Environment:
Python: 3.10
Transformers: Latest version
PyTorch: Latest version
GPU: 85.05 GB memory available
CUDA: Properly installed and available
What I've tried:
Loading with a BitsAndBytesConfig:
python
Copy
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
llm_int8_has_fp16_weight=True
)
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=bnb_config,
trust_remote_code=True,
use_cache=True,
device_map='auto',
max_memory={0: "24GiB"}
)
Loading without device mapping:
python
Copy
model_kwargs = {
"trust_remote_code": True,
"load_in_4bit": True,
"torch_dtype": torch.float16,
"use_cache": True
}
Expected behavior
Checklist
The text was updated successfully, but these errors were encountered: