multiple gpus? #181

aliborji · 2019-05-17T20:16:44Z

Is it possible to train the model (this code) on multiple gpus?

wilkice · 2019-05-19T05:02:03Z

Not support. You can change the code to support. reference: PyTorch MULTI-GPU EXAMPLES

kaikaizhu · 2019-06-19T08:22:21Z

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.

RuntimeError: CUDA error: device-side assert triggered.

H-YunHui · 2019-07-31T02:07:52Z

@kaikaizhu
I have the same question, and do you solve it now?

MichaelCong · 2019-08-05T00:37:26Z

I have the same question, and do you solve it now?

YIYIZH · 2019-08-06T06:38:37Z

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.

RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.

GangHu1993 · 2019-08-21T12:11:05Z

I have the same question,

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.
RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.

I have same probliem, do you have solved this ?

yangzhaonan18 · 2019-09-11T06:01:16Z

多GPU加载模型和数据官网有教程，加载权重怎么弄，求个学习链接，谢谢
Multi-GPU loading model and data official website have tutorials, how to load weights, ask for a learning link, thank you

ywatanabe1989 · 2019-10-31T14:11:43Z

https://github.com/ywatanabe1989/PyTorch-gaussian-YOLOv3-1D/
Although there might be mistakes (at least, the print log output is broken now.) and it's not sophisticated at all, I tried to implement the multi-GPU functioning.

Please give me some feedback. :)

ujsyehao · 2019-11-17T04:24:59Z

@YIYIZH Hi, I solve the problem using your proposed method(Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length).

ujsyehao · 2019-11-17T04:43:03Z

You can refer to https://github.com/ujsyehao/yolov3-multigpu

sakurasakura1996 · 2020-03-16T02:37:05Z

@aliborji ,@wilkice
I want to use multi-GPU training. Did you add it successfully? I made a mistake when I added it.
RuntimeError: CUDA error: device-side assert triggered.

I met the same error, and I think the problem comes from here.
loss, outputs = model(imgs, targets)
while i use 4 gpus, I found the imgs and targets are divided equally into 4 gpus. The problem is that the targets should not be equally splitted since each img have different number of targets(bboxes).

One of the solution is to do like this:
outputs = model(imgs)
loss = compute_loss(outputs, targets) # you have to write a new function to compute the loss

Another is to find the size of the longest targets of imgs in one batch, and pad zeros to each target into the same length. such that targets can correspond to the imgs.

If anyone has better solution, pls tell me. Thx.
Did you solve the problem,I want to run the code on my custom data with one GPU,but I found I am training on CPU

Flova · 2021-02-02T14:12:35Z

Duplicate of #265 #507 #290

ujsyehao mentioned this issue Jul 8, 2020

why is it needed to get the batch index in build_targets ujsyehao/yolov3-multigpu#8

Closed

Flova closed this as completed Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple gpus? #181

multiple gpus? #181

aliborji commented May 17, 2019

wilkice commented May 19, 2019

kaikaizhu commented Jun 19, 2019

H-YunHui commented Jul 31, 2019

MichaelCong commented Aug 5, 2019

YIYIZH commented Aug 6, 2019

GangHu1993 commented Aug 21, 2019

yangzhaonan18 commented Sep 11, 2019

ywatanabe1989 commented Oct 31, 2019

ujsyehao commented Nov 17, 2019

ujsyehao commented Nov 17, 2019

sakurasakura1996 commented Mar 16, 2020

Flova commented Feb 2, 2021

multiple gpus? #181

multiple gpus? #181

Comments

aliborji commented May 17, 2019

wilkice commented May 19, 2019

kaikaizhu commented Jun 19, 2019

H-YunHui commented Jul 31, 2019

MichaelCong commented Aug 5, 2019

YIYIZH commented Aug 6, 2019

GangHu1993 commented Aug 21, 2019

yangzhaonan18 commented Sep 11, 2019

ywatanabe1989 commented Oct 31, 2019

ujsyehao commented Nov 17, 2019

ujsyehao commented Nov 17, 2019

sakurasakura1996 commented Mar 16, 2020

Flova commented Feb 2, 2021