To evaluate the results, please upload the zip file to the competition server.
Backbone | J&F | CFBI J&F | Pretrain | Model | Submission | CFBI Submission |
---|---|---|---|---|---|---|
ResNet-50 | 55.6 | 59.4 | weight | model | link | link |
ResNet-101 | 57.3 | 60.3 | weight | model | link | link |
Swin-T | 58.7 | 61.2 | weight | model | link | link |
Swin-L | 62.4 | 63.3 | weight | model | link | link |
Video-Swin-T* | 55.8 | - | - | model | link | - |
Video-Swin-T | 59.4 | - | weight | model | link | - |
Video-Swin-S | 60.1 | - | weight | model | link | - |
Video-Swin-B | 62.9 | - | weight | model | link | - |
* indicates the model is trained from scratch.
Joint training with Ref-COCO/+/g datasets.
Backbone | J&F | J | F | Model | Submission |
---|---|---|---|---|---|
ResNet-50 | 58.7 | 57.4 | 60.1 | model | link |
ResNet-101 | 59.3 | 58.1 | 60.4 | model | link |
Swin-L | 64.2 | 62.3 | 66.2 | model | link |
Video-Swin-T | 62.6 | 59.9 | 63.3 | model | link |
Video-Swin-S | 63.3 | 61.4 | 65.2 | model | link |
Video-Swin-B | 64.9 | 62.8 | 67.0 | model | link |
First, inference using the trained model.
python3 inference_ytvos.py --with_box_refine --binary --freeze_text_encoder --output_dir=[/path/to/output_dir] --resume=[/path/to/model_weight] --backbone [backbone]
python3 inference_ytvos.py --with_box_refine --binary --freeze_text_encoder --output_dir=ytvos_dirs/swin_tiny --resume=ytvos_swin_tiny.pth --backbone swin_t_p4w7
If you want to visualize the predited masks, you may add --visualize
to the above command.
Then, enter the output_dir
, rename the folder valid
as Annotations
. Use the following command to zip the folder:
zip -q -r submission.zip Annotations
To evaluate the results, please upload the zip file to the competition server.
- Finetune
The following command includes the training and inference stages.
./scripts/dist_train_test_ytvos.sh [/path/to/output_dir] [/path/to/pretrained_weight] --backbone [backbone]
For example, training the Video-Swin-Tiny model, run the following command:
./scripts/dist_train_test_ytvos.sh ytvos_dirs/video_swin_tiny pretrained_weights/video_swin_tiny_pretrained.pth --backbone video_swin_t_p4w7
- Train from scratch
The following command includes the training and inference stages.
./scripts/dist_train_test_ytvos_scratch.sh [/path/to/output_dir] --backbone [backbone] --backbone_pretrained [/path/to/backbone_pretrained_weight] [other args]
For example, training the Video-Swin-Tiny model, run the following command:
./scripts/dist_train_test_ytvos.sh ytvos_dirs/video_swin_tiny_scratch --backbone video_swin_t_p4w7 --backbone_pretrained video_swin_pretrained/swin_tiny_patch244_window877_kinetics400_1k.pth