MMDetection v3.2.0 Release
Highlight
v3.2.0 was released in 12/10/2023:
1. Detection Transformer SOTA Model Collection
(1) Supported four updated and stronger SOTA Transformer models: DDQ, CO-DETR, AlignDETR, and H-DINO.
(2) Based on CO-DETR, MMDet released a model with a COCO performance of 64.1 mAP.
(3) Algorithms such as DINO support AMP/Checkpoint/FrozenBN
, which can effectively reduce memory usage.
2. Comprehensive Performance Comparison between CNN and Transformer
RF100 consists of a dataset collection of 100 real-world datasets, including 7 domains. It can be used to assess the performance differences of Transformer models like DINO and CNN-based algorithms under different scenarios and data volumes. Users can utilize this benchmark to quickly evaluate the robustness of their algorithms in various scenarios.
3. Support for GLIP and Grounding DINO fine-tuning, the only algorithm library that supports Grounding DINO fine-tuning
The Grounding DINO algorithm in MMDet is the only library that supports fine-tuning. Its performance is one point higher than the official version, and of course, GLIP also outperforms the official version.
We also provide a detailed process for training and evaluating Grounding DINO on custom datasets. Everyone is welcome to give it a try.
Model | Backbone | Style | COCO mAP | Official COCO mAP |
---|---|---|---|---|
Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
Grounding DINO-B | Swin-B | Finetune | 59.7 | |
Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |
4. Support for the open-vocabulary detection algorithm Detic and multi-dataset joint training.
5. Training detection models using FSDP and DeepSpeed.
ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
---|---|---|---|---|---|---|
1 | 49 (A100) | 0.9 | ||||
2 | √ | 39 (A100) | 1.2 | |||
3 | √ | 33 (A100) | 1.1 | |||
4 | √ | √ | 25 (A100) | 1.3 | ||
5 | √ | √ | 18 | 2.2 | ||
6 | √ | √ | √ | 13 | 1.6 | |
7 | √ | √ | √ | 14 | 2.9 | |
8 | √ | √ | √ | √ | 8.5 | 2.4 |
6. Support for the V3Det dataset, a large-scale detection dataset with over 13,000 categories.
亮点
v3.2.0 版本已经在 2023.10.12 发布:
1. 检测 Transformer SOTA 模型大合集
(1) 支持了 DDQ、CO-DETR、AlignDETR 和 H-DINO 4 个更新更强的 SOTA Transformer 模型
(2) 基于 CO-DETR, MMDet 中发布了 COCO 性能为 64.1 mAP 的模型
(3) DINO 等算法支持 AMP/Checkpoint/FrozenBN,可以有效降低显存
2. 提供了全面的 CNN 和 Transformer 的性能对比
RF100 是由 100 个现实收集的数据集组成,包括 7 个域,可以验证 DINO 等 Transformer 模型和 CNN 类算法在不同场景不同数据量下的性能差异。用户可以用这个 Benchmark 快速验证自己的算法在不同场景下的鲁棒性。
3. 支持了 GLIP 和 Grounding DINO 微调,全网唯一支持 Grounding DINO 微调
MMDet 中的 Grounding DINO 是全网唯一支持微调的算法库,且性能高于官方 1 个点,当然 GLIP 也比官方高。
我们还提供了详细的 Grounding DINO 在自定义数据集上训练评估的流程,欢迎大家试用。
Model | Backbone | Style | COCO mAP | Official COCO mAP |
---|---|---|---|---|
Grounding DINO-T | Swin-T | Zero-shot | 48.5 | 48.4 |
Grounding DINO-T | Swin-T | Finetune | 58.1(+0.9) | 57.2 |
Grounding DINO-B | Swin-B | Zero-shot | 56.9 | 56.7 |
Grounding DINO-B | Swin-B | Finetune | 59.7 | |
Grounding DINO-R50 | R50 | Scratch | 48.9(+0.8) | 48.1 |
4. 支持开放词汇检测算法 Detic 并提供多数据集联合训练可能
5. 轻松使用 FSDP 和 DeepSpeed 训练检测模型
ID | AMP | GC of Backbone | GC of Encoder | FSDP | Peak Mem (GB) | Iter Time (s) |
---|---|---|---|---|---|---|
1 | 49 (A100) | 0.9 | ||||
2 | √ | 39 (A100) | 1.2 | |||
3 | √ | 33 (A100) | 1.1 | |||
4 | √ | √ | 25 (A100) | 1.3 | ||
5 | √ | √ | 18 | 2.2 | ||
6 | √ | √ | √ | 13 | 1.6 | |
7 | √ | √ | √ | 14 | 2.9 | |
8 | √ | √ | √ | √ | 8.5 | 2.4 |
6. 支持了 V3Det 1.3w+ 类别的超大词汇检测数据集