Replies: 1 comment 1 reply
-
https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-quantization Breakdown for K quantization, but more information about them is in pull requests. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello Team,
I have been learning about transformer quantization and is particularly interested in full integer 8-bit quantization. What quantization scheme (like GPTQ, AWQ, SmoothQuant) is supported in full integer quantization in
llama.cpp
? To give a context, TFLite uses symmetric int8 quantization for calibration and inference.Or does llama.cpp support any quantization type as long as the format is in GGUF or GGML? I really appreciate any help or pointers. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions