Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple gpus? #181

Closed
aliborji opened this issue May 17, 2019 · 12 comments
Closed

multiple gpus? #181

aliborji opened this issue May 17, 2019 · 12 comments

Comments

@aliborji
Copy link

Is it possible to train the model (this code) on multiple gpus?

@wilkice
Copy link

wilkice commented May 19, 2019

Not support. You can change the code to support. reference: PyTorch MULTI-GPU EXAMPLES

@kaikaizhu
Copy link

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.

RuntimeError: CUDA error: device-side assert triggered.

@H-YunHui
Copy link

@kaikaizhu
I have the same question, and do you solve it now?

@MichaelCong
Copy link

I have the same question, and do you solve it now?

@YIYIZH
Copy link

YIYIZH commented Aug 6, 2019

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.

RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.

@GangHu1993
Copy link

I have the same question,

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.
RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.

I have same probliem, do you have solved this ?

@yangzhaonan18
Copy link

多GPU加载模型和数据官网有教程,加载权重怎么弄,求个学习链接,谢谢
Multi-GPU loading model and data official website have tutorials, how to load weights, ask for a learning link, thank you

@ywatanabe1989
Copy link

https://github.com/ywatanabe1989/PyTorch-gaussian-YOLOv3-1D/
Although there might be mistakes (at least, the print log output is broken now.) and it's not sophisticated at all, I tried to implement the multi-GPU functioning.

Please give me some feedback. :)

@ujsyehao
Copy link

@YIYIZH Hi, I solve the problem using your proposed method(Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length).

@ujsyehao
Copy link

You can refer to https://github.com/ujsyehao/yolov3-multigpu

@sakurasakura1996
Copy link

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.
RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.
Did you solve the problem,I want to run the code on my custom data with one GPU,but I found I am training on CPU

@Flova
Copy link
Collaborator

Flova commented Feb 2, 2021

Duplicate of #265 #507 #290

@Flova Flova closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests