-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation Metric Stuck at Zero During YOLOv9m Training #148
Comments
Hi, I am also facing similar issue, when training is initiated it gets stuck on first epoch. The dataset structure same, i.e. dataset folder in root, which contains images, labels and test folder. Each contains images and labels folder. I have added This is the data yaml file contents:
The command I used is as mentioned in the docs : Inside train yaml file, the epochs have been set to 2 just to check this issue and it takes more than one hour and crashes(as seen in wandb status) or goes on. |
Hey, I'm not sure about this, but I happened to notice, that the box coordinates for the mock dataset contain absolute box coordinates instead of relative ones, such as:
meanwhile your Data seems to be in relative coordinates:
this might explain why the network is unable to learn anything. |
I think the default coco bounding box format is indeed x,y,width,height. (eg https://www.v7labs.com/blog/coco-dataset-guide) |
Hi, Apologies for the misleading message earlier. The issue is actually caused by the following line: For I’ll fix this issue soon. Best regards, |
@henrytsui000 I also see a small bug because of which if you have bboxes in coco file - they still will be ignored:
You'll never get into
Do you want me to create a MR? |
After fixing that I still see very poor results, so something is still off in my case. |
Hi, |
Facing the same issue. Tried changing |
Okay, tried with both But as soon as validation epoch ends, everything goes to zero. And it mostly remains zero for all subsequent epochs. I see the Boxloss, DFLLoss, and BCEloss changing though. Issue is same for all @henrytsui000 any advice? |
Try smaller learning rate. At least for my (small) data set, default 0.01 was way too high and i had similar behaviour. (0.0001 worked great for me) |
@agriic How big is your dataset? With your suggestion of 0.0001 I get the same results as before (all 0s) |
~2000 images, 6 classes, at max 4 objects - all quite small. |
@agriic I have almost same number of images as yours. Setting LR to |
Hi @henrytsui000 , Any update on this error? Best Regards |
I'l also having an issue on custom dataset training. my pre-trained model works fine. but when i trying to create a custom dataset training i ididnt get a good result.
|
I am trying to train YOLOv9m on a custom dataset, but I am encountering an issue where the validation metric stays at zero throughout the training. I have checked my dataset and configuration files, but I am unsure about the cause of the issue.
The dataset is structured as follows:
images: Contains the image files (PNG)
labels: Contains the annotation files in YOLO format ( e.g.: 0 0.12 0.62 0.05 0.07 )(TXT)
I have modified the image_size in yolo/config/general.yaml to the size I want for training. The file looks like this:
Training Command:
python yolo/lazy.py task=train task.data.batch_size=4 task.data.image_size=[512,512] model=v9-m dataset=TMP device=cuda use_wandb=False use_tensorboard=True
Despite having followed the training setup and providing the dataset, the validation metrics remain at zero throughout the entire training process. I am unsure whether it’s an issue with the dataset formatting, configuration, or the training setup itself.
The text was updated successfully, but these errors were encountered: