Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrefillPlan tries to allocate more memory than float_workspace_size_in_bytes passed in. #809

Open
rchardx opened this issue Feb 12, 2025 · 1 comment

Comments

@rchardx
Copy link

rchardx commented Feb 12, 2025

Current PrefillPlan interface:

template <typename IdType>
inline cudaError_t PrefillPlan(void* float_buffer, size_t float_workspace_size_in_bytes,
                               void* int_buffer, void* page_locked_int_buffer,
                               size_t int_workspace_size_in_bytes, PrefillPlanInfo& plan_info,
                               IdType* qo_indptr_h, IdType* kv_indptr_h, uint32_t total_num_rows,
                               uint32_t batch_size, uint32_t num_qo_heads, uint32_t num_kv_heads,
                               uint32_t head_dim_qk, uint32_t head_dim_vo, uint32_t page_size,
                               bool enable_cuda_graph, uint32_t sizeof_dtype_o,
                               cudaStream_t stream);

float_workspace_size_in_bytes is an input parameter that cannot be determined in advance through function calls or other means.

Currently, PrefillPlan may attempt to allocate variables like batch_prefill_tmp_v and batch_prefill_tmp_s without checking if they would exceed the available float workspace size.

To prevent this issue, a function should be implemented to notify users about the required float workspace size.

@yzh119
Copy link
Collaborator

yzh119 commented Feb 13, 2025

Yes, the required float workspace buffer size can be determined given hardware information (number of SMs), head dimensions and query tile size (see appendix D.2 in the paper), I'll work on this after upgrading the scheduler to v2 (some of them have not been upstreamed yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants