[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

CharlesRiggins · 2024-09-02T04:00:15Z

I want to quantize the KV cache to FP8 E4M3 on top of GPTQ. Is it possible to do it with llm-compressor?

robertgshaw2-neuralmagic · 2024-09-03T15:01:26Z

@mgoin @horheynm - could you provide an example of this?

horheynm · 2024-09-05T01:23:11Z

Thank you for using llm-compressor. We are currently working on this feature this sprint! You will be able to do this very shortly, please give us couple days and we will get back with example script for you to try out!

markurtz · 2024-10-18T01:51:14Z

@horheynm do we have an update on the status for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

CharlesRiggins commented Sep 2, 2024

robertgshaw2-neuralmagic commented Sep 3, 2024

horheynm commented Sep 5, 2024

markurtz commented Oct 18, 2024

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

Comments

CharlesRiggins commented Sep 2, 2024

robertgshaw2-neuralmagic commented Sep 3, 2024

horheynm commented Sep 5, 2024

markurtz commented Oct 18, 2024