Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Reuse JIT code path for building AOT wheel #791

Open
abcdabcd987 opened this issue Feb 6, 2025 · 6 comments
Open

[Feature] Reuse JIT code path for building AOT wheel #791

abcdabcd987 opened this issue Feb 6, 2025 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@abcdabcd987
Copy link
Member

Currently we maintain AOT csrc and JIT csrc separately. Since JIT is the priority and AOT is tested less often, AOT falls out of sync with JIT from time to time. This causes lots of issues (e.g., compilation failure, linking failure, missing symbols at runtime, falling back to JIT) and duplicated work must be paid to bring the changes to AOT. This is clearly not sustainable.

On the other hand, we do want to have a wheel that have some common kernels precompiled.

My proposal is: Get rid of AOT csrc. Use JIT code path to generate and compile kernels during wheel build time.

@yzh119 yzh119 added the enhancement New feature or request label Feb 13, 2025
@qiaoning
Copy link

qiaoning commented Feb 13, 2025

Yes,lots of bugs when compiling&run unittest and benchmark.

For example:

  1. generate_dispatch_inc.py config.cmake NOT match,
  2. Possible Deadlock Detected when execute benchmark, block at pos_encoding_mode=1
  3. ld returned 1 exit status
  4. unknown file: Failure when run test_xxx
    C++ exception with description "Error in function 'single_mha' at /home/users/flashinfer/src/cpu_reference.h:90: std::vector<dtype_out> cpu_reference::single_mha(const std::vector&, const std::vector<dtype_kv>&, const std::vector<dtype_kv>&, size_t, size_t, size_t, size_t, size_t, bool, flashinfer::QKVLayout, flashinfer::PosEncodingMode, float, float) [with dtype_q = __half; dtype_kv = __half; dtype_out = __half; size_t = long unsigned int] failed to dispatch head_dim 64" thrown in the test body.

and so on...

@yzh119
Copy link
Collaborator

yzh119 commented Feb 13, 2025

Hi @qiaoning yes the C++ side are terribly outdated and needs a thorough cleaning

@diptorupd
Copy link

diptorupd commented Feb 13, 2025

@yzh119 I am taking a look. I will open a RFC once I have some more clarity on a reorganization. The work is part of my overall goal to add AMDGPU support for the kernels.

@diptorupd
Copy link

@abcdabcd987 have you already started on a redesign/refactor? I will collaborate with you in that case.

@abcdabcd987
Copy link
Member Author

No I haven't started.

@ur4t
Copy link
Contributor

ur4t commented Feb 14, 2025

#748 did part of the job. A translator from Python interface to C++ interface might be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants