What are we caching only K and V? Why not Q? #5027
Closed
sathyanarays
started this conversation in
General
Replies: 1 comment
-
KV Cache的思想是,对于K和V矩阵,由于它们相对稳定,可以在不同时间步骤缓存它们,这样,对于相同的输入,您不需要重新计算K和V矩阵,而可以重复使用它们。这显著减少了计算开销,特别是在处理长序列或大批次数据时。 相反,Q矩阵是依赖于输入的,因此每次都不同,无法进行缓存,因此Q矩阵通常不被缓存。这样做旨在平衡计算和内存的使用,因为缓存Q矩阵可能需要大量内存,而且通常不太实际。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello folks, any idea why we are only caching keys and values that are generated from the tokens? Why are we not caching the query?
Beta Was this translation helpful? Give feedback.
All reactions