Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the vllm 0.6 is ok? #2883

Open
catsled opened this issue Feb 18, 2025 · 2 comments
Open

Does the vllm 0.6 is ok? #2883

catsled opened this issue Feb 18, 2025 · 2 comments
Labels
🐛 bug Something isn't working

Comments

@catsled
Copy link

catsled commented Feb 18, 2025

for some reason, i have to use the vllm==0.6 to train grpo, but it will meet the

Image

after that

Image

[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 289, in forward
[rank0]:     hidden_states = self.embed_tokens(input_ids)
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 413, in forward
[rank0]:     output_parallel = self.linear_method.embedding(self,
[rank0]:   File "/usr/local/lib/python3.10/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 57, in embedding
[rank0]:     return F.embedding(input_, layer.weight)
[rank0]:   File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2292, in embedding
[rank0]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
@github-actions github-actions bot added the 🐛 bug Something isn't working label Feb 18, 2025
@vagitablebirdcode
Copy link

vagitablebirdcode commented Feb 18, 2025

I have the same trouble,also if use vllm < 0.6.5, the GRPOTrainer will cause an error that lack of vllm.worker.worker.Worker._assert_memory_footprint_increased_during_profiling
when use vllm >= 0.6.5 and vllm < 0.7, there cause a conflict of multi-device use of vllm and other utils.

@XZ-X
Copy link
Contributor

XZ-X commented Feb 19, 2025

I encountered the same issue. Using vllm >= 0.7 solves the problem for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants