Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on custom Dataset #141

Open
vl1969s opened this issue Dec 19, 2024 · 8 comments
Open

Train on custom Dataset #141

vl1969s opened this issue Dec 19, 2024 · 8 comments
Labels
question Further information is requested

Comments

@vl1969s
Copy link

vl1969s commented Dec 19, 2024

I'm trying to train a yolo v9 model with a custom dataset, but I'm not sure how to do it. I'm following this steps:

  1. I have a folder with the folders images and labels on it. inside each folder I have train and val folders and inside images or txt (with same name as image) and labels in yolo format. e.g.: 0 0.12 0.62 0.05 0.07

  2. I Create a custom.yml file and located in yolo/config/dataset/custom.yml with this simple data:

path: C:/temp/clips
train: train
validation: val
class_num: 2
class_list: ['ok', 'nok']
auto_download:
  1. I run this command to train a model with my custom dataset:

python lazy.py task=train dataset=custom.yaml use_wandb=False device=cuda

Are the steps above correct? am I missing something?

can I place the custom.yml file in the same place as the dataset?

@vl1969s vl1969s added the question Further information is requested label Dec 19, 2024
@henrytsui000
Copy link
Member

Hi,

Yes, your process looks correct! You can place the custom.yml file in the same directory as the dataset if you prefer, but make sure the paths in the file are correctly referenced.

However, please note that we originally used segmentation annotations in the training process. I recently merged the code to support x, y, w, h annotations in YOLO format. If you encounter any unexpected bugs, please let me know so I can address them promptly.

Best regards,
Henry Tsui

@ArgoHA
Copy link

ArgoHA commented Jan 4, 2025

@henrytsui000 will "classical" yolo format work? txt files with class_id norm_xywh
like 0 0.1 0.2 0.1 0.1

I just started training, I see very poor metrics in comparison to DAMO-YOLO and yolov8, so I assume something is wrong
Screenshot 2025-01-04 at 20 39 00

And as an example here is what I get with the same dataset and DAMO-YOLO

Should I see +- same conversion as ultralytics/damo-yolo with same 60 epochs?
Screenshot 2025-01-04 at 20 43 16

Here is my training command, I assume it uses pretrained weights:
python yolo/lazy.py task=train task.data.batch_size=10 model=v9-m dataset=b_cars device =gpu

@ArgoHA
Copy link

ArgoHA commented Jan 5, 2025

I tried training from scratch with python yolo/lazy.py task=train task.data.batch_size=10 model=v9-m dataset=b_cars device=gpu task.epoch=60 weight=False and even that gave me "higher" metrics (still very bad, but it makes sense, I should run like 10 times more epochs for training from scratch, but that's not what I need). So I think maybe something is wrong with training on a custom dataset from pretrained weights

Screenshot 2025-01-05 at 07 41 00

Do I understand correctly that without weight=False I will get the same training, but just COCO weights initialised? No layers are frozen, right?

Also scheduler works differently for some reason
Screenshot 2025-01-05 at 07 47 24

@ramonhollands
Copy link
Contributor

I tried training with xywh style annotations for myself and saw that the annotations uploaded to wandb are not correct. I'm planning to have a look at this issue next week.

@ramonhollands
Copy link
Contributor

Training works fine on my end. The only issue is that wandb should receive non-normalized values.

@ramonhollands
Copy link
Contributor

I am using json annotated files. There is an issue with txt annotated files (see #148)

@fmichea
Copy link

fmichea commented Feb 11, 2025

Hi, I've been looking into this repository and tried to transfer train a model based on data I labelled and I am also struggling to get good results or figure out why it is failing for me. @ArgoHA were you able to find more information about what is causing low metrics for you?

I've tried debugging a few things before posting, in terms of the code I have looked at all the data loading, translating images and labels to squares, etc. All of that seems to be working fine to me.

For my data, it consists of 27 labels for 10K images, some labels have very little data unfortunately (<40 samples) but the more important labels have thousands of samples. All the images are 1280x720 jpg images, the dataset is in COCO format with the JSON annotations.

Training to 200 epochs with the following command:

python yolo/lazy.py task=train task.data.batch_size=4 device=cuda task.validation.data.batch_size=4 cpu_num=6 dataset=sm64 name=sm64 model=v9-c task.epoch=200

Results in [email protected] barely above 40% and AP @ .5:.95 barely above 20%. You can see I get a very similar pattern of progression as @ArgoHA

YOLO training on custom dataset
Image

You can find the ground truth and samples for every 25 epochs in this repository here, as seen in wandb. I also included a few other screenshot of metrics from the YOLO training process.

I tried to use YOLOX and DAMO-YOLO on the same dataset but haven't been able to find the right set of dependencies to have them run on my setup, however I do have an mmdetection faster RCNN pipeline that works and when I run it on the exact same dataset, I get much better results. After 12 epochs, mAP @ .5 and mAP @ .75 close to 90%.

Faster RCNN (mmdetection) training on the same dataset
Image

I have also included samples of images with predictions from the 12-epoch Faster RCNN model on images here, which clearly shows the model is able to make very accurate prediction after training on this dataset. Unfortunately inference with this model is too slow for my use case, which is why I was looking into YOLO based models.

I ran this on revision fa548df with the following changes which should not alter the behavior of the program (PRs incoming as I test things):

diff --git a/yolo/tools/data_loader.py b/yolo/tools/data_loader.py
index c44f00c..0b2cf8b 100644
--- a/yolo/tools/data_loader.py
+++ b/yolo/tools/data_loader.py
@@ -230,6 +241,7 @@ def create_dataloader(data_cfg: DataConfig, dataset_cfg: DatasetConfig, task: st
         num_workers=data_cfg.cpu_num,
         pin_memory=data_cfg.pin_memory,
         collate_fn=collate_fn,
+        persistent_workers=True,
     )

diff --git a/yolo/utils/logging_utils.py b/yolo/utils/logging_utils.py
index 28a5362..074023d 100644
--- a/yolo/utils/logging_utils.py
+++ b/yolo/utils/logging_utils.py
@@ -107,7 +107,7 @@ class YOLORichProgressBar(RichProgressBar):
         epoch_descript = "[cyan]Train [white]|"
         batch_descript = "[green]Train [white]|"
         metrics = self.get_metrics(trainer, pl_module)
-        metrics.pop("v_num")
+        metrics.pop("v_num", None)  # Remove v_num key if present.
         for metrics_name, metrics_val in metrics.items():
             if "Loss_step" in metrics_name:
                 epoch_descript += f"{metrics_name.removesuffix('_step').split('/')[1]: ^9}|"

This is my first time trying to work with ML projects so I apologize if I am doing something obviously wrong. I must admit I am not 100% sure this type of dataset suits machine learning, or YOLO, if I am launching the training process correctly, or any other basic mistakes. I am also just learning about the metrics and therefore might have made an incorrect comparison.

What would be your recommendation for next steps to figure out if there is a way to get better results? I would appreciate any pointers

@ArgoHA
Copy link

ArgoHA commented Feb 12, 2025

It seems like your dataset is not hard, what means that metrics should be significantly higher. I never figure out what was the issue with this repo, although I found and fixed at least 1 bug. I ended up using new SoTA transformer based model. If you want, we can chat about it (D-FINE model) as I started working on PRs to make it more user friendly. Maybe I can help you with your task and you will give me some feedback on things to improve for easier use.
linkedin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants