When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

leoyuppieqnew · 2024-09-09T04:02:53Z

No description provided.

robertgshaw2-neuralmagic · 2024-09-09T13:57:44Z

This is something we are actively working on supporting end-to-end.

In vllm, we currently support 2:4 sparsity with w4A16 and w8a16. We need to add inference kernels to support w8a8 fp8 with sparse 2:4. We are collaborating the cutlass teams on this.

leoyuppieqnew added the enhancement New feature or request label Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

leoyuppieqnew commented Sep 9, 2024

robertgshaw2-neuralmagic commented Sep 9, 2024

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

When can we support w8a8 fp8 quantization and sparse2:4 llm compress and adapt it on vllm? #148

Comments

leoyuppieqnew commented Sep 9, 2024

robertgshaw2-neuralmagic commented Sep 9, 2024