Release v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration · ultralytics/yolov5

This release implements two architecture changes to YOLOv5, as well as various bug fixes and performance improvements.

Breaking Changes

nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations used in previous versions. nn.SiLU() was introduced in PyTorch 1.7.0 (https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html), and due to the recent timeframe certain export pipelines may be temporarily unavailable (CoreML possibly) without updates to the associated tools (i.e. coremltools).

Bug Fixes

Multi-GPU --resume #1810
leaf Variable inplace bug fix #1759
Various additional bug fixes contained in PRs #1235 through #1837

Added Functionality

Weights & Biases (W&B) Feature Addition #1235
Utils reorganization #1392
PyTorch Hub and autoShape update #1415
W&B artifacts feature addition #1712
Various additional feature additions contained in PRs #1235 through #1837

Updated Results

Latest models are all slightly smaller to due removal of one convolution within each bottleneck, which have been renamed as C3() modules now in light of the 3 I/O convolutions each one does vs the 4 in the standard CSP bottleneck. The previous manual concatenation and LeakyReLU(0.1) activations have both removed, simplifying the architecture, reducing parameter count, and better exploiting the .fuse() operation at inference time.

nn.SiLU() activations replace nn.LeakyReLU(0.1) and nn.Hardswish() activations throughout the model, simplifying the architecture as we now only have one single activation function used everywhere rather than the two types before.

In general the changes result in smaller models (89.0M params -> 87.7M YOLOv5x), faster inference times (6.9ms -> 6.0ms), and improved mAP (49.2 -> 50.1) for all models except YOLOv5s, which reduced mAP slightly (37.0 -> 36.8). In general the largest models benefit the most from this update. YOLOv5x in particular is now above 50.0 mAP at --img-size 640, which may be the first time this is possible at 640 resolution for any architecture I'm aware of (correct me if I'm wrong though).

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.

Pretrained Checkpoints

Model	size	AP^val	AP^test	AP₅₀	Speed_V100	FPS_V100	params	GFLOPS
YOLOv5s	640	36.8	36.8	55.6	2.2ms	455	7.3M	17.0
YOLOv5m	640	44.5	44.5	63.1	2.9ms	345	21.4M	51.3
YOLOv5l	640	48.1	48.1	66.4	3.8ms	264	47.0M	115.4
YOLOv5x	640	50.1	50.1	68.7	6.0ms	167	87.7M	218.8

YOLOv5x + TTA	832	51.9	51.9	69.6	24.9ms	40	87.7M	1005.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0 - nn.SiLU() activations, Weights & Biases logging, PyTorch Hub integration

Breaking Changes

Bug Fixes

Added Functionality

Updated Results

Pretrained Checkpoints