Skip to content

Releases: zjykzj/YOLO11Face

ADD YOLOv8-pose and YOLO11-pose

01 Mar 08:39
Compare
Choose a tag to compare

This warehouse has attempted to train two model architectures in total. The first one is to train and validate the WIDERFACE dataset using only the yolov5/yolov8/yolo11 detection model architecture.

ARCH GFLOPs Easy Medium Hard
zjykzj/YOLO11Face yolov5nu 7.1 93.86 91.70 80.37
zjykzj/YOLO11Face yolov5su 23.8 95.13 93.47 84.33
zjykzj/YOLO11Face yolov8s 28.4 95.77 94.18 84.54
zjykzj/YOLO11Face yolo11s 21.3 95.55 93.91 84.85

The second method uses Ultralytics' pose model for joint training of faces and keypoints, and finally evaluates only the facial performance of the validation set in the original way.

Note that the facial keypoint annotation here comes from RetinaFace, which only annotated facial keypoints on the original training set. Therefore, when training the pose model, the training part of the original WIDERFACE train dataset is divided into training/validation datasets in an 8:2 ratio, and the val dataset is evaluated after training is completed.

ARCH GFLOPs Easy Medium Hard
zjykzj/YOLO5Face yolov5n-v7.0 4.2 93.25 91.11 80.33
zjykzj/YOLO5Face yolov5s-v7.0 15.8 94.84 93.28 84.67
zjykzj/YOLO11Face yolov8n-pose 8.3 94.61 92.46 80.98
zjykzj/YOLO11Face yolov8s-pose 29.4 95.50 93.95 84.65
zjykzj/YOLO11Face yolo11n-pose 6.6 94.62 92.56 81.02
zjykzj/YOLO11Face yolo11s-pose 22.3 95.72 94.19 85.24

During the eval phase, using VGA resolution input images (the longer edge of the input image is scaled to 640, and the shorter edge is scaled accordingly)

Using YOLOv8-pose

15 Feb 07:37
Compare
Choose a tag to compare
Using YOLOv8-pose Pre-release
Pre-release

Through experiments, it was found that using YOLOv8-pose can simultaneously detect faces and facial keypoints. Using the facial and keypoint datasets provided by RetinaFace, only the training dataset was used. Validation/training was split into 2:8 parts, and the facial and keypoint detectors were trained from scratch. Finally, they were evaluated on the WIDERFACE val dataset.

ARCH GFLOPs Easy Medium Hard
zjykzj/YOLO5Face yolov5s-v7.0 15.8 94.84 93.28 84.67
zjykzj/YOLO5Face yolov5n-v7.0 4.2 93.25 91.11 80.33
zjykzj/YOLO8Face yolov5su 23.8 95.18 93.50 82.47
zjykzj/YOLO8Face yolov5nu 7.1 93.96 91.82 78.89
zjykzj/YOLO8Face yolov8s 28.4 95.81 94.26 82.75
zjykzj/YOLO8Face yolov8n 8.1 94.57 92.55 78.97
zjykzj/YOLO8Face yolov8s-pose 29.4 95.07 93.77 82.84
zjykzj/YOLO8Face yolov8n-pose 8.3 94.07 92.04 78.88

During the eval phase, using VGA resolution input images (the longer edge of the input image is scaled to 640, and the shorter edge is scaled accordingly)

INIT

03 Feb 07:14
Compare
Choose a tag to compare
INIT Pre-release
Pre-release
ARCH GFLOPs Easy Medium Hard
zjykzj/YOLO5Face yolov5s-v7.0 15.8 94.84 93.28 84.67
zjykzj/YOLO5Face yolov5n-v7.0 4.2 93.25 91.11 80.33
zjykzj/YOLO8Face yolov5su 23.8 95.18 93.50 82.47
zjykzj/YOLO8Face yolov5nu 7.1 93.96 91.82 78.89
zjykzj/YOLO8Face yolov8s 28.4 95.81 94.26 82.75
zjykzj/YOLO8Face yolov8n 8.1 94.57 92.55 78.97

During the eval phase, using VGA resolution input images (the longer edge of the input image is scaled to 640, and the shorter edge is scaled accordingly)

During the training phase, the longer edge of the input image is scaled to 800, and the shorter edge is scaled accordingly