Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

Open
CharlesRiggins opened this issue Sep 2, 2024 · 3 comments
Open

[Usage] Can I do GPTQ with FP8 KV cache scheme? #137

CharlesRiggins opened this issue Sep 2, 2024 · 3 comments

Comments

@CharlesRiggins
Copy link

I want to quantize the KV cache to FP8 E4M3 on top of GPTQ. Is it possible to do it with llm-compressor?

@robertgshaw2-neuralmagic
Copy link
Collaborator

@mgoin @horheynm - could you provide an example of this?

@horheynm
Copy link
Collaborator

horheynm commented Sep 5, 2024

Hi @CharlesRiggins

Thank you for using llm-compressor. We are currently working on this feature this sprint! You will be able to do this very shortly, please give us couple days and we will get back with example script for you to try out!

@markurtz
Copy link
Collaborator

@horheynm do we have an update on the status for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants