-
Statistics Department of JNU
- Guangzhou, China
-
14:28
(UTC +08:00) - https://github.com/DefTruth
- https://www.zhihu.com/people/qyjdef
Pinned Loading
-
lite.ai.toolkit
lite.ai.toolkit Public🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
Awesome-LLM-Inference
Awesome-LLM-Inference Public📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
-
CUDA-Learn-Notes
CUDA-Learn-Notes Public📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
-
Awesome-Diffusion-Inference
Awesome-Diffusion-Inference Public📖A curated list of Awesome Diffusion Inference Papers with codes, such as Sampling, Caching, Multi-GPUs, etc. 🎉🎉
-
ffpa-attn-mma
ffpa-attn-mma Public📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.
If the problem persists, check the GitHub status page or contact support.