Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate kleidiAI release v0.1.0 into MNN 2.9.3 #2995

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

xhzheng1895
Copy link

Put KleidiAI files in folder source/backend/cpu/arm/kleidiAI/kai, download from arm gitlab and remain unchanged. Maybe will remove these files and download them when build.

MNNKleidiAI.cpp is interface between MNN and KleidiAI.

Rewrite function in class DenseConvInt8TiledExecutor , in ConvInt8TiledExecutor.cpp, to call KleidiAI functions. Maybe implement a new execution later.

Changes to GeometryConvUtils.cpp and ShapeTensorConvert.cpp are for the input and output of DenseConvInt8TiledExecutor is NCHW, rather than NC4HW4, to avoid redundant pack/unpack and get better performance.

Put KleidiAI files in folder source/backend/cpu/arm/kleidiAI/kai,
download from arm gitlab and remain unchanged. Maybe will remove
these files and download them when build.

MNNKleidiAI.cpp is interface between MNN and KleidiAI.

Rewrite function in class DenseConvInt8TiledExecutor
, in ConvInt8TiledExecutor.cpp, to call KleidiAI functions.
Maybe implement a new execution later.

Changes to GeometryConvUtils.cpp and ShapeTensorConvert.cpp are for
the input and output of DenseConvInt8TiledExecutor is NCHW,
rather than NC4HW4, to avoid redundant pack/unpack and get better
performance.
@CLAassistant
Copy link

CLAassistant commented Aug 16, 2024

CLA assistant check
All committers have signed the CLA.

@wangzhaode
Copy link
Collaborator

在M3芯片上测试了下面的2个模型,结果不正确

https://modelscope.cn/models/zhaode/Qwen2-7B-Instruct-MNN
https://modelscope.cn/models/zhaode/Qwen2-1.5B-Instruct-MNN

@xhzheng1895 xhzheng1895 reopened this Aug 21, 2024
@xhzheng1895
Copy link
Author

Hi,现在kleidiAI只支持对称量化的模型。
对于非对称量化模型,会走到DenseConvInt8TiledExecutor原本的一些函数里。但是需要把KAI_CONV_NCHW_IN_OUT这个宏关掉,否则输入输出format会和DenseConvInt8TiledExecutor原生的函数不匹配。

@wangzhaode
Copy link
Collaborator

OK测试了一下对称量化的模型没有问题,decode性能相比MNN的原始实现有加速效果
在M3 Pro上测试Qwen2-1.5B-int4, CPU 4线程速度如下:

prefill decode
MNN 330 75
KleidiAI 295 85

@yiyangfan01
Copy link

Here is the perf data I collected with the same model with @wangzhaode on RedMi K60 ultra(MTK D9300 inside), 16GB RAM, 4Threads.
Prefill has 57% improvement, decode has 28% improvement.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants