Validation Metric Stuck at Zero During YOLOv9m Training #148

ProfessorHT · 2025-01-04T21:02:48Z

I am trying to train YOLOv9m on a custom dataset, but I am encountering an issue where the validation metric stays at zero throughout the training. I have checked my dataset and configuration files, but I am unsure about the cause of the issue.

The dataset is structured as follows:

images: Contains the image files (PNG)
labels: Contains the annotation files in YOLO format ( e.g.: 0 0.12 0.62 0.05 0.07 )(TXT)

I have modified the image_size in yolo/config/general.yaml to the size I want for training. The file looks like this:

Training Command:

python yolo/lazy.py task=train task.data.batch_size=4 task.data.image_size=[512,512] model=v9-m dataset=TMP device=cuda use_wandb=False use_tensorboard=True

Despite having followed the training setup and providing the dataset, the validation metrics remain at zero throughout the entire training process. I am unsure whether it’s an issue with the dataset formatting, configuration, or the training setup itself.

Is there anything in the configuration or training command that might be causing the validation metrics to stay at zero?
Are there any additional steps I should take to ensure the training process is properly tracking the validation metrics?

akshaypx · 2025-01-05T16:06:59Z

Hi, I am also facing similar issue, when training is initiated it gets stuck on first epoch.

The dataset structure same, i.e. dataset folder in root, which contains images, labels and test folder. Each contains images and labels folder.

I have added data.yaml in /yolo/config/dataset/.

This is the data yaml file contents:

path: dataset 
train: train 
validation: valid 

class_num: 1 
nc: 1 
names: ["tws"]

The command I used is as mentioned in the docs : python yolo/lazy.py task=train dataset=data use_wandb=True.

Inside train yaml file, the epochs have been set to 2 just to check this issue and it takes more than one hour and crashes(as seen in wandb status) or goes on.

Adamusen · 2025-01-07T16:08:08Z

Hey,

I'm not sure about this, but I happened to notice, that the box coordinates for the mock dataset contain absolute box coordinates instead of relative ones, such as:

"bbox": [
    530.18,
    126.04,
    88.94,
    204.35
],

meanwhile your Data seems to be in relative coordinates:

labels: Contains the annotation files in YOLO format ( e.g.: 0 0.12 0.62 0.05 0.07 )(TXT)

this might explain why the network is unable to learn anything.

ramonhollands · 2025-01-07T18:36:46Z

I think the default coco bounding box format is indeed x,y,width,height.

(eg https://www.v7labs.com/blog/coco-dataset-guide)
"List of objects with the following information: Object class (e.g., "person," "car"); Bounding box coordinates (x, y, width, height); Segmentation mask (polygon or RLE format); Keypoints and their positions (if available)"

henrytsui000 · 2025-01-07T18:43:45Z

Hi,

Apologies for the misleading message earlier. The issue is actually caused by the following line:
https://github.com/WongKinYiu/YOLO/blob/fa548dfd7bbf18a0c5f2244183fdeaa60a527e08/yolo/tools/data_loader.py#L107

For .txt format annotations, it should use class_id, x_c, y_c, w, h and then convert to a format like class_id, x1, y1, x1, y2, x2, y2, x2, y1. However, It currently use .txt files as a segmentation format.

I’ll fix this issue soon.

Best regards,
Henry Tsui

ArgoHA · 2025-01-08T16:16:24Z

@henrytsui000 I also see a small bug because of which if you have bboxes in coco file - they still will be ignored:
in scale_segmentation

    for anno in annotations:
        category_id = anno["category_id"]
        if "segmentation" in anno:
            print("Here")
            seg_list = [item for sublist in anno["segmentation"] for item in sublist]
        elif "bbox" in anno:
            x, y, width, height = anno["bbox"]
            seg_list = [x, y, x + width, y, x + width, y + height, x, y + height]

You'll never get into elif "bbox" in anno when you have "segmentation": [], in json (as it is). I fixed it with simple change:

if len(anno.get("segmentation"))

Do you want me to create a MR?

ArgoHA · 2025-01-08T19:29:55Z

After fixing that I still see very poor results, so something is still off in my case.
upd: I checked images and annotations after preprocessing, everything looks correct, although mosaic scaling might be too aggressive, but that can't be the root of an issue I am facing with very poor accuracy.

Nico-Rixe-VVB · 2025-01-14T14:58:46Z

Hi,
Anything new on this topic?
@henrytsui000, I tried your recommendation, but sadly with no success.

tahsinalamin · 2025-01-15T22:27:46Z

Facing the same issue. Tried changing class_id, x_c, y_c, w, h to class_id, x1, y1, x1, y2, x2, y2, x2, y1 for .txt files and deleting the train.cache and val.cache files before each run. Still validation metrics are not changing from 0.

tahsinalamin · 2025-01-17T17:54:18Z

Facing the same issue. Tried changing class_id, x_c, y_c, w, h to class_id, x1, y1, x1, y2, x2, y2, x2, y1 for .txt files and deleting the train.cache and val.cache files before each run. Still validation metrics are not changing from 0.

Okay, tried with both .txt and .json formats. The issue remains the same. I am training with one class detection. When I begin training, I see AP and AR percentages changing and having some values.

But as soon as validation epoch ends, everything goes to zero. And it mostly remains zero for all subsequent epochs. I see the Boxloss, DFLLoss, and BCEloss changing though. Issue is same for all c,m,s models.

@henrytsui000 any advice?

agriic · 2025-01-17T18:14:02Z

Try smaller learning rate. At least for my (small) data set, default 0.01 was way too high and i had similar behaviour. (0.0001 worked great for me)

Nico-Rixe-VVB · 2025-01-20T11:49:16Z

@agriic How big is your dataset? With your suggestion of 0.0001 I get the same results as before (all 0s)

agriic · 2025-01-20T12:06:58Z

~2000 images, 6 classes, at max 4 objects - all quite small.

tahsinalamin · 2025-01-20T17:06:00Z

@agriic I have almost same number of images as yours. Setting LR to 0.0001 helped resolving the issue I was having, but still the values are too poor (<5%).

ProfessorHT · 2025-02-03T11:02:05Z

Hi @henrytsui000 ,

Any update on this error?

Best Regards

RJKNATT100 · 2025-02-19T03:21:09Z

I'l also having an issue on custom dataset training. my pre-trained model works fine. but when i trying to create a custom dataset training i ididnt get a good result.

label the image using labelIMG and Label me
do we have a guide to train using custom dataset

ProfessorHT added the question Further information is requested label Jan 4, 2025

akshaypx mentioned this issue Jan 6, 2025

Correct way of starting training on custom dataset #151

Open

ramonhollands mentioned this issue Jan 8, 2025

Train on custom Dataset #141

Open

This was referenced Feb 21, 2025

Proper box format for YOLO .txt files #158

Open

🩹 [Fix] Labels in YOLO detection format #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation Metric Stuck at Zero During YOLOv9m Training #148

Validation Metric Stuck at Zero During YOLOv9m Training #148

ProfessorHT commented Jan 4, 2025

akshaypx commented Jan 5, 2025

Adamusen commented Jan 7, 2025

ramonhollands commented Jan 7, 2025

henrytsui000 commented Jan 7, 2025

ArgoHA commented Jan 8, 2025

ArgoHA commented Jan 8, 2025 •

edited

Loading

Nico-Rixe-VVB commented Jan 14, 2025

tahsinalamin commented Jan 15, 2025

tahsinalamin commented Jan 17, 2025

agriic commented Jan 17, 2025

Nico-Rixe-VVB commented Jan 20, 2025

agriic commented Jan 20, 2025

tahsinalamin commented Jan 20, 2025

ProfessorHT commented Feb 3, 2025

RJKNATT100 commented Feb 19, 2025

Validation Metric Stuck at Zero During YOLOv9m Training #148

Validation Metric Stuck at Zero During YOLOv9m Training #148

Comments

ProfessorHT commented Jan 4, 2025

akshaypx commented Jan 5, 2025

Adamusen commented Jan 7, 2025

ramonhollands commented Jan 7, 2025

henrytsui000 commented Jan 7, 2025

ArgoHA commented Jan 8, 2025

ArgoHA commented Jan 8, 2025 • edited Loading

Nico-Rixe-VVB commented Jan 14, 2025

tahsinalamin commented Jan 15, 2025

tahsinalamin commented Jan 17, 2025

agriic commented Jan 17, 2025

Nico-Rixe-VVB commented Jan 20, 2025

agriic commented Jan 20, 2025

tahsinalamin commented Jan 20, 2025

ProfessorHT commented Feb 3, 2025

RJKNATT100 commented Feb 19, 2025

ArgoHA commented Jan 8, 2025 •

edited

Loading