What are we caching only K and V? Why not Q? #5027

sathyanarays · 2024-05-24T09:32:59Z

sathyanarays
May 24, 2024

Hello folks, any idea why we are only caching keys and values that are generated from the tokens? Why are we not caching the query?

akai-shuuichi · 2024-07-09T06:14:37Z

akai-shuuichi
Jul 9, 2024

KV Cache的思想是，对于K和V矩阵，由于它们相对稳定，可以在不同时间步骤缓存它们，这样，对于相同的输入，您不需要重新计算K和V矩阵，而可以重复使用它们。这显著减少了计算开销，特别是在处理长序列或大批次数据时。相反，Q矩阵是依赖于输入的，因此每次都不同，无法进行缓存，因此Q矩阵通常不被缓存。这样做旨在平衡计算和内存的使用，因为缓存Q矩阵可能需要大量内存，而且通常不太实际。

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are we caching only K and V? Why not Q? #5027

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What are we caching only K and V? Why not Q? #5027

sathyanarays May 24, 2024

Replies: 1 comment

akai-shuuichi Jul 9, 2024

sathyanarays
May 24, 2024

akai-shuuichi
Jul 9, 2024