Add pipeline parallel support to `TransformersModel` #12832

hmellor · 2025-02-06T14:15:16Z

Depends on changes made by huggingface/transformers#36091.

Tested with Llama 3.1 70B in pp=4 and pp2,tp2 configurations.

The following command can be used for testing:

CUDA_VISIBLE_DEVICES=0,1 vllm serve meta-llama/Llama-3.2-1B-Instruct --disable-log-requests --model-impl transformers --gpu-memory-utilization 0.25 --pipeline-parallel-size 2

Signed-off-by: Harry Mellor <[email protected]>

github-actions · 2025-02-06T14:15:28Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Harry Mellor <[email protected]>

vllm/model_executor/models/transformers.py

andoorve · 2025-02-07T00:27:12Z

vllm/model_executor/models/transformers.py

-        log_replacement("input embedding", self.model.get_input_embeddings(),
-                        new_module)
-        self.model.set_input_embeddings(new_module)
+    def init_buffers(self, module: nn.Module):


General comment, might it be better to move this function to a separate transformers_utils file?

We could, but I worry that would pollute the models directory

andoorve · 2025-02-07T00:27:39Z

vllm/model_executor/models/transformers.py

+        for child in module.children():
+            self.init_buffers(child)
+
+    def meta_to_empty(self, module: nn.Module):


Same, might it be better to move this function to a separate transformers_utils file?

Signed-off-by: Harry Mellor <[email protected]>

mergify · 2025-02-12T11:58:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hmellor.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2025-02-12T12:01:06Z

Can only be merged after Transformers v4.49, but marking as ready for review so this can be merged soon after.

hmellor added 3 commits February 6, 2025 09:14

Add pipeline parallel support to TransformersModel

04a93de

Signed-off-by: Harry Mellor <[email protected]>

Simplify weight loading

099b3e0

Signed-off-by: Harry Mellor <[email protected]>

Only allocate tensors for current pipeline stage

8b1ea1d

Signed-off-by: Harry Mellor <[email protected]>

Don't set buffers to empty right after initialising them

ec626e8

Signed-off-by: Harry Mellor <[email protected]>

andoorve reviewed Feb 7, 2025

View reviewed changes

hmellor added 4 commits February 7, 2025 19:07

Update to use _pp_plan and _tp_plan

8eafc6d

Signed-off-by: Harry Mellor <[email protected]>

Respond to comment

87c395c

Signed-off-by: Harry Mellor <[email protected]>

Add docstring to init_buffers

3498758

Signed-off-by: Harry Mellor <[email protected]>

Update some comments

e652d50

Signed-off-by: Harry Mellor <[email protected]>

hmellor marked this pull request as ready for review February 12, 2025 11:58

mergify bot added the needs-rebase label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pipeline parallel support to `TransformersModel` #12832

Add pipeline parallel support to `TransformersModel` #12832

hmellor commented Feb 6, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 6, 2025

andoorve Feb 7, 2025

hmellor Feb 7, 2025

andoorve Feb 7, 2025

mergify bot commented Feb 12, 2025

hmellor commented Feb 12, 2025

Add pipeline parallel support to TransformersModel #12832

Are you sure you want to change the base?

Add pipeline parallel support to TransformersModel #12832

Conversation

hmellor commented Feb 6, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 6, 2025

andoorve Feb 7, 2025

Choose a reason for hiding this comment

hmellor Feb 7, 2025

Choose a reason for hiding this comment

andoorve Feb 7, 2025

Choose a reason for hiding this comment

mergify bot commented Feb 12, 2025

hmellor commented Feb 12, 2025

Add pipeline parallel support to `TransformersModel` #12832

Add pipeline parallel support to `TransformersModel` #12832

hmellor commented Feb 6, 2025 •

edited by github-actions bot

Loading