Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert model to FP8 error #110

Open
kuangdao opened this issue Aug 26, 2024 · 1 comment
Open

convert model to FP8 error #110

kuangdao opened this issue Aug 26, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@kuangdao
Copy link

Describe the bug
A clear and concise description of what the bug is.

Expected behavior
A clear and concise description of what you expected to happen.

Environment
Include all relevant environment information:

  1. OS [Ubuntu 20.04]:
  2. Python version [3.10.12]:
  3. LLM Compressor version or commit hash [0.1.0]:
  4. ML framework version(s) [2.4.0]:
  5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
  6. Other relevant environment information [e.g. hardware, CUDA version]:

To Reproduce


from llmcompressor.transformers import SparseAutoModelForCausalLM
from transformers import AutoTokenizer

MODEL_ID = "/data/models/deepseek-coder-6.7b-base/"

model = SparseAutoModelForCausalLM.from_pretrained(
  MODEL_ID, device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)


from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# Configure the simple PTQ quantization
recipe = QuantizationModifier(
  targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])

# Apply the quantization algorithm.
oneshot(model=model, recipe=recipe)

# Save the model.
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

Errors

截屏2024-08-26 10 49 07

Additional context

@kuangdao kuangdao added the bug Something isn't working label Aug 26, 2024
@robertgshaw2-neuralmagic
Copy link
Sponsor Collaborator

Can you share:

  • torch version
  • are you running on cpu or gpu

It looks like max for fp8 is not supported on your torch version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants