Difference in different quantization methods #2094
-
Hello, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
The llama.cpp team does do some "perplexity" testing - which approximately determines "best output quality" lowers score are better... |
Beta Was this translation helpful? Give feedback.
-
The ppl column is perplexity increase relative to unquantized. |
Beta Was this translation helpful? Give feedback.
-
Брат, как сделать так, чтобы она информацию из реального времени брала? |
Beta Was this translation helpful? Give feedback.
-
Why q8, f16 and f32 are not recommended even if there is low quality loss. |
Beta Was this translation helpful? Give feedback.
K-quantizations should be better, at the same file size, then the other ones. S M L means small medium large :)
more details can be found here: #1684