FP8 QAT / FP8 block-wise quantization #1632

cassanof · 2025-01-28T02:26:39Z

Having QAT for FP8 would be a great addition, and FP8-blockwise quantization in general.

danielvegamyhre · 2025-01-28T02:38:59Z

We have an issue tracking fp8 quantization with block-wise scaling here #1594

supriyar · 2025-01-28T18:00:30Z

@cassanof for QAT - do you mean quantized fine-tuning with FP8 or QAT (which simulates the quantization but doesn't actually quantize during training). Also cc @andrewor14 for QAT

cassanof · 2025-01-29T05:03:10Z

the latter :)

andrewor14 · 2025-01-30T23:40:59Z

Hi @cassanof, thanks for raising the issue! Do you mind sharing the use case for FP8 QAT? Most use cases we've seen are directly doing FP8 (in lower precision) pretraining or finetuning. Is your goal to do FP8 QAT (fake quantize in high precision, e.g. bfloat16), and then actually quantize the model to FP8 after training? Is there more context you can share regarding this workflow?

cassanof · 2025-02-06T06:52:43Z

hey @andrewor14. my goal is to do most of my training in bf16, and then right at the end do blockwise QAT to improve the performance of my model, which will get block-wise quantized for inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 QAT / FP8 block-wise quantization #1632

FP8 QAT / FP8 block-wise quantization #1632

cassanof commented Jan 28, 2025

danielvegamyhre commented Jan 28, 2025

supriyar commented Jan 28, 2025

cassanof commented Jan 29, 2025

andrewor14 commented Jan 30, 2025

cassanof commented Feb 6, 2025

FP8 QAT / FP8 block-wise quantization #1632

FP8 QAT / FP8 block-wise quantization #1632

Comments

cassanof commented Jan 28, 2025

danielvegamyhre commented Jan 28, 2025

supriyar commented Jan 28, 2025

cassanof commented Jan 29, 2025

andrewor14 commented Jan 30, 2025

cassanof commented Feb 6, 2025