add gptq and awq int4 support in intel platform #2444

sywangyi · 2024-08-22T05:51:15Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi · 2024-08-22T05:51:58Z

@Narsil @danieldk please help review.

sywangyi · 2024-09-03T01:06:11Z

@ErikKaum could you help review the PR?

ErikKaum · 2024-09-03T08:30:26Z

Hi @sywangyi 👋

Yes, let me run the tests in a separate branch so that we don't get the permission errors 👍 I should have time to do it today or tomorrow 👍

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi · 2024-09-10T06:56:46Z

@ErikKaum @Narsil upload fix for ci, please rerun the ci

ErikKaum · 2024-09-10T09:31:53Z

@sywangyi there seems still to be an error in the dockerfile:

Dockerfile_intel:154
--------------------
 152 |     RUN git clone https://github.com/intel/intel-extension-for-pytorch && cd intel-extension-for-pytorch && git checkout f86e93e4890dc2c989024d148d415c9aa8a1649f
 153 |     RUN git clone https://github.com/intel/torch-ccl.git && cd torch-ccl && git checkout v2.4.0+cpu+rc0
 154 | >>> RUN cd intel-extension-for-pytorch && git submodule sync && git submodule update --init --recursive && python setup.py install
 155 |     RUN cd torch-ccl && git submodule sync && git submodule update --init --recursive && pip install .
 156 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c cd intel-extension-for-pytorch && git submodule sync && git submodule update --init --recursive && python setup.py install" did not complete successfully: exit code: 1

sywangyi · 2024-09-10T12:00:13Z

@ErikKaum Could you help to retrigger the CI build/for intel-cpu, since we did not see the build error in previous ci and I have not made any change to the Dockerfile_intel in the new commits

sywangyi · 2024-09-12T12:55:58Z

will rework it after #2517 is merged. since python is upgraded from 3.10 to 2.11

sywangyi · 2024-09-13T01:30:57Z

@ErikKaum rebase done, please retrigger the CI, review and merge it.

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi · 2024-09-17T08:36:31Z

seems the failure is not related with the PR
ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_simple - RuntimeError: Launcher crashed
ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_all_params - RuntimeError: Launcher crashed
ERROR integration-tests/models/test_flash_medusa.py::test_flash_medusa_load - RuntimeError: Launcher crashed

sywangyi · 2024-09-17T08:36:59Z

@ErikKaum could you help retrigger it?

sywangyi · 2024-10-08T03:12:04Z

this PR is also needed to make mllama output correct in ipex-cpu. since it will upgrade ipex. could anyone help merge it?

sywangyi · 2024-10-08T03:12:38Z

@ErikKaum @Narsil ,please help. @yao-matrix

Signed-off-by: Wang, Yi A <[email protected]>

Narsil · 2024-10-14T15:08:23Z

server/text_generation_server/layers/gptq/__init__.py

@@ -321,7 +322,7 @@ def get_weights_row(self, weights: Weights, prefix: str):
            if g_idx is not None:
                if (
                    not torch.equal(
-                        g_idx.cpu(),
+                        (g_idx - g_idx[0]).cpu(),


Can you explain why this is needed exactly? Probably as comment in code too.

Your code seems correct because this should be about sharding alignment.
However since desc_act should be checked before the fact that this pathway is failing seems to indicate that something is wrong with the target model possibly.

If desc_act is False, exllama should be used. But in sharding case. if g_idx is not in ascend order, use_exllama is set to False.
which means for model like https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ, it will not use exllama in TP case in previous logic. now ipex also implements function like exllama. but exllama kernel does not support intel cpu/xpu.

Narsil · 2024-10-14T15:57:22Z

server/text_generation_server/layers/gptq/__init__.py

@@ -350,16 +352,16 @@ def get_weights_row(self, weights: Weights, prefix: str):
            else:
                log_once(logger.info, f"Using exllama kernels v{HAS_EXLLAMA}")

-        if use_exllama and self.groupsize != -1:
+        if not desc_act and self.groupsize != -1:


We should keep use_exllama here.

Exllama is a really specific kernel. Purely about semantics it's easier for us to know what this is exllama specific.

use_exllama is False since exllama only support cuda. but ipex quantization runtime kernel also implement similar logic like exllama. so we need such sharded logic for qzeros/scale/g_idx if desc_act is False as well.

Isn't keeping use_exllama and simply fixing the TP (with - g_idx[0]) in the conditional to fix the issues on IPEX ?

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/layers/gptq/__init__.py#L134,this line set the use_exllama to false, since in intel platform, exllama is not installed.

server/text_generation_server/layers/gptq/__init__.py

server/text_generation_server/models/flash_causal_lm.py

server/text_generation_server/layers/awq/quantize/qmodule.py

Signed-off-by: Wang, Yi A <[email protected]>

Narsil · 2024-10-18T16:04:07Z

IT's merged from an updated PR I prepared for CI (#2665) (only minor fixes have been updated in the control flow and adding a few comments).

add gptq and awq int4 support in intel platform

0b02d45

Signed-off-by: Wang, Yi A <[email protected]>

ErikKaum self-assigned this Sep 3, 2024

This was referenced Sep 3, 2024

hotfix: fix regression of attention api change in intel platform #2439

Merged

CI for add gptq and awq int4 support in intel platform #2494

Closed

sywangyi added 2 commits September 9, 2024 23:19

Merge branch 'main' into gpt_awq_4

8c3859d

fix ci failure

8857b68

Signed-off-by: Wang, Yi A <[email protected]>

Merge branch 'main' into gpt_awq_4

10628e8

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi added 2 commits October 8, 2024 07:55

Merge branch 'main' into gpt_awq_4

92fa7ac

set kv cache dtype

78ca141

Signed-off-by: Wang, Yi A <[email protected]>

Narsil reviewed Oct 14, 2024

View reviewed changes

sywangyi added 2 commits October 14, 2024 20:28

Merge branch 'main' into gpt_awq_4

7c6230c

Signed-off-by: Wang, Yi A <[email protected]>

refine the code according to the review command

b069d2c

Signed-off-by: Wang, Yi A <[email protected]>

sywangyi closed this Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gptq and awq int4 support in intel platform #2444

add gptq and awq int4 support in intel platform #2444

sywangyi commented Aug 22, 2024

sywangyi commented Aug 22, 2024

sywangyi commented Sep 3, 2024

ErikKaum commented Sep 3, 2024

sywangyi commented Sep 10, 2024

ErikKaum commented Sep 10, 2024

sywangyi commented Sep 10, 2024 •

edited

Loading

sywangyi commented Sep 12, 2024

sywangyi commented Sep 13, 2024

sywangyi commented Sep 17, 2024

sywangyi commented Sep 17, 2024

sywangyi commented Oct 8, 2024

sywangyi commented Oct 8, 2024

Narsil Oct 14, 2024

sywangyi Oct 15, 2024 •

edited

Loading

Narsil Oct 14, 2024

sywangyi Oct 15, 2024

Narsil Oct 18, 2024

sywangyi Oct 18, 2024

Narsil commented Oct 18, 2024

add gptq and awq int4 support in intel platform #2444

add gptq and awq int4 support in intel platform #2444

Conversation

sywangyi commented Aug 22, 2024

What does this PR do?

Before submitting

Who can review?

sywangyi commented Aug 22, 2024

sywangyi commented Sep 3, 2024

ErikKaum commented Sep 3, 2024

sywangyi commented Sep 10, 2024

ErikKaum commented Sep 10, 2024

sywangyi commented Sep 10, 2024 • edited Loading

sywangyi commented Sep 12, 2024

sywangyi commented Sep 13, 2024

sywangyi commented Sep 17, 2024

sywangyi commented Sep 17, 2024

sywangyi commented Oct 8, 2024

sywangyi commented Oct 8, 2024

Narsil Oct 14, 2024

Choose a reason for hiding this comment

sywangyi Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Narsil Oct 14, 2024

Choose a reason for hiding this comment

sywangyi Oct 15, 2024

Choose a reason for hiding this comment

Narsil Oct 18, 2024

Choose a reason for hiding this comment

sywangyi Oct 18, 2024

Choose a reason for hiding this comment

Narsil commented Oct 18, 2024

sywangyi commented Sep 10, 2024 •

edited

Loading

sywangyi Oct 15, 2024 •

edited

Loading