Train on custom Dataset #141

vl1969s · 2024-12-19T11:58:08Z

I'm trying to train a yolo v9 model with a custom dataset, but I'm not sure how to do it. I'm following this steps:

I have a folder with the folders images and labels on it. inside each folder I have train and val folders and inside images or txt (with same name as image) and labels in yolo format. e.g.: 0 0.12 0.62 0.05 0.07
I Create a custom.yml file and located in yolo/config/dataset/custom.yml with this simple data:

path: C:/temp/clips
train: train
validation: val
class_num: 2
class_list: ['ok', 'nok']
auto_download:

I run this command to train a model with my custom dataset:

python lazy.py task=train dataset=custom.yaml use_wandb=False device=cuda

Are the steps above correct? am I missing something?

can I place the custom.yml file in the same place as the dataset?

The text was updated successfully, but these errors were encountered:

henrytsui000 · 2025-01-03T14:34:25Z

Hi,

Yes, your process looks correct! You can place the custom.yml file in the same directory as the dataset if you prefer, but make sure the paths in the file are correctly referenced.

However, please note that we originally used segmentation annotations in the training process. I recently merged the code to support x, y, w, h annotations in YOLO format. If you encounter any unexpected bugs, please let me know so I can address them promptly.

Best regards,
Henry Tsui

ArgoHA · 2025-01-04T16:43:34Z

@henrytsui000 will "classical" yolo format work? txt files with class_id norm_xywh
like 0 0.1 0.2 0.1 0.1

I just started training, I see very poor metrics in comparison to DAMO-YOLO and yolov8, so I assume something is wrong

And as an example here is what I get with the same dataset and DAMO-YOLO

Should I see +- same conversion as ultralytics/damo-yolo with same 60 epochs?

Here is my training command, I assume it uses pretrained weights:
python yolo/lazy.py task=train task.data.batch_size=10 model=v9-m dataset=b_cars device =gpu

ArgoHA · 2025-01-05T03:49:57Z

I tried training from scratch with python yolo/lazy.py task=train task.data.batch_size=10 model=v9-m dataset=b_cars device=gpu task.epoch=60 weight=False and even that gave me "higher" metrics (still very bad, but it makes sense, I should run like 10 times more epochs for training from scratch, but that's not what I need). So I think maybe something is wrong with training on a custom dataset from pretrained weights

Do I understand correctly that without weight=False I will get the same training, but just COCO weights initialised? No layers are frozen, right?

Also scheduler works differently for some reason

ramonhollands · 2025-01-05T15:23:18Z

I tried training with xywh style annotations for myself and saw that the annotations uploaded to wandb are not correct. I'm planning to have a look at this issue next week.

ramonhollands · 2025-01-08T07:18:45Z

Training works fine on my end. The only issue is that wandb should receive non-normalized values.

ramonhollands · 2025-01-08T07:20:01Z

I am using json annotated files. There is an issue with txt annotated files (see #148)

fmichea · 2025-02-11T21:06:58Z

Hi, I've been looking into this repository and tried to transfer train a model based on data I labelled and I am also struggling to get good results or figure out why it is failing for me. @ArgoHA were you able to find more information about what is causing low metrics for you?

I've tried debugging a few things before posting, in terms of the code I have looked at all the data loading, translating images and labels to squares, etc. All of that seems to be working fine to me.

For my data, it consists of 27 labels for 10K images, some labels have very little data unfortunately (<40 samples) but the more important labels have thousands of samples. All the images are 1280x720 jpg images, the dataset is in COCO format with the JSON annotations.

Training to 200 epochs with the following command:

python yolo/lazy.py task=train task.data.batch_size=4 device=cuda task.validation.data.batch_size=4 cpu_num=6 dataset=sm64 name=sm64 model=v9-c task.epoch=200

Results in [email protected] barely above 40% and AP @ .5:.95 barely above 20%. You can see I get a very similar pattern of progression as @ArgoHA

YOLO training on custom dataset

You can find the ground truth and samples for every 25 epochs in this repository here, as seen in wandb. I also included a few other screenshot of metrics from the YOLO training process.

I tried to use YOLOX and DAMO-YOLO on the same dataset but haven't been able to find the right set of dependencies to have them run on my setup, however I do have an mmdetection faster RCNN pipeline that works and when I run it on the exact same dataset, I get much better results. After 12 epochs, mAP @ .5 and mAP @ .75 close to 90%.

Faster RCNN (mmdetection) training on the same dataset

I have also included samples of images with predictions from the 12-epoch Faster RCNN model on images here, which clearly shows the model is able to make very accurate prediction after training on this dataset. Unfortunately inference with this model is too slow for my use case, which is why I was looking into YOLO based models.

I ran this on revision fa548df with the following changes which should not alter the behavior of the program (PRs incoming as I test things):

diff --git a/yolo/tools/data_loader.py b/yolo/tools/data_loader.py
index c44f00c..0b2cf8b 100644
--- a/yolo/tools/data_loader.py
+++ b/yolo/tools/data_loader.py
@@ -230,6 +241,7 @@ def create_dataloader(data_cfg: DataConfig, dataset_cfg: DatasetConfig, task: st
         num_workers=data_cfg.cpu_num,
         pin_memory=data_cfg.pin_memory,
         collate_fn=collate_fn,
+        persistent_workers=True,
     )

diff --git a/yolo/utils/logging_utils.py b/yolo/utils/logging_utils.py
index 28a5362..074023d 100644
--- a/yolo/utils/logging_utils.py
+++ b/yolo/utils/logging_utils.py
@@ -107,7 +107,7 @@ class YOLORichProgressBar(RichProgressBar):
         epoch_descript = "[cyan]Train [white]|"
         batch_descript = "[green]Train [white]|"
         metrics = self.get_metrics(trainer, pl_module)
-        metrics.pop("v_num")
+        metrics.pop("v_num", None)  # Remove v_num key if present.
         for metrics_name, metrics_val in metrics.items():
             if "Loss_step" in metrics_name:
                 epoch_descript += f"{metrics_name.removesuffix('_step').split('/')[1]: ^9}|"

This is my first time trying to work with ML projects so I apologize if I am doing something obviously wrong. I must admit I am not 100% sure this type of dataset suits machine learning, or YOLO, if I am launching the training process correctly, or any other basic mistakes. I am also just learning about the metrics and therefore might have made an incorrect comparison.

What would be your recommendation for next steps to figure out if there is a way to get better results? I would appreciate any pointers

ArgoHA · 2025-02-12T09:31:14Z

It seems like your dataset is not hard, what means that metrics should be significantly higher. I never figure out what was the issue with this repo, although I found and fixed at least 1 bug. I ended up using new SoTA transformer based model. If you want, we can chat about it (D-FINE model) as I started working on PRs to make it more user friendly. Maybe I can help you with your task and you will give me some feedback on things to improve for easier use.
linkedin

vl1969s added the question Further information is requested label Dec 19, 2024

This was referenced Feb 21, 2025

Proper box format for YOLO .txt files #158

Open

🩹 [Fix] Labels in YOLO detection format #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on custom Dataset #141

Train on custom Dataset #141

vl1969s commented Dec 19, 2024

henrytsui000 commented Jan 3, 2025

ArgoHA commented Jan 4, 2025 •

edited

Loading

ArgoHA commented Jan 5, 2025

ramonhollands commented Jan 5, 2025

ramonhollands commented Jan 8, 2025

ramonhollands commented Jan 8, 2025

fmichea commented Feb 11, 2025

ArgoHA commented Feb 12, 2025

Train on custom Dataset #141

Train on custom Dataset #141

Comments

vl1969s commented Dec 19, 2024

henrytsui000 commented Jan 3, 2025

ArgoHA commented Jan 4, 2025 • edited Loading

ArgoHA commented Jan 5, 2025

ramonhollands commented Jan 5, 2025

ramonhollands commented Jan 8, 2025

ramonhollands commented Jan 8, 2025

fmichea commented Feb 11, 2025

ArgoHA commented Feb 12, 2025

ArgoHA commented Jan 4, 2025 •

edited

Loading