Question about pagedattention #36

SherrySwift · 2024-09-06T09:40:08Z

Hi, thanks for your great work! I have a small question.
In kernels/csrc/fused_attention/applyBiasRopeUpdateKVCache.h, I saw that you set store_contiguous_qkv = true. Does it mean that pagedattention is not used when quantize KV Cache into int8/int4?
Looking forward to your reply, thank you!

ys-2020 · 2024-10-01T01:23:37Z

Hi,

Thanks for your interests in QServe. We do use page attention when KV Cache is quantized into int8/int4. Please ignore that flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about pagedattention #36

Question about pagedattention #36

SherrySwift commented Sep 6, 2024

ys-2020 commented Oct 1, 2024

Question about pagedattention #36

Question about pagedattention #36

Comments

SherrySwift commented Sep 6, 2024

ys-2020 commented Oct 1, 2024