Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 QAT / FP8 block-wise quantization #1632

Open
cassanof opened this issue Jan 28, 2025 · 5 comments
Open

FP8 QAT / FP8 block-wise quantization #1632

cassanof opened this issue Jan 28, 2025 · 5 comments

Comments

@cassanof
Copy link

Having QAT for FP8 would be a great addition, and FP8-blockwise quantization in general.

@danielvegamyhre
Copy link
Contributor

We have an issue tracking fp8 quantization with block-wise scaling here #1594

@supriyar
Copy link
Contributor

@cassanof for QAT - do you mean quantized fine-tuning with FP8 or QAT (which simulates the quantization but doesn't actually quantize during training). Also cc @andrewor14 for QAT

@cassanof
Copy link
Author

the latter :)

@andrewor14
Copy link
Contributor

Hi @cassanof, thanks for raising the issue! Do you mind sharing the use case for FP8 QAT? Most use cases we've seen are directly doing FP8 (in lower precision) pretraining or finetuning. Is your goal to do FP8 QAT (fake quantize in high precision, e.g. bfloat16), and then actually quantize the model to FP8 after training? Is there more context you can share regarding this workflow?

@cassanof
Copy link
Author

cassanof commented Feb 6, 2025

hey @andrewor14. my goal is to do most of my training in bf16, and then right at the end do blockwise QAT to improve the performance of my model, which will get block-wise quantized for inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants