-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation fails when image names are not int convertible. #67
Comments
I have encountered this case, and I try to solve it by matching the filename and img_path.
Did you train the model successfully? I trained the model and I can't get the AP that is more than 0. Do you know how to solve it? |
@yjmm10 thank you. I appreciate your input. I want to share my thoughts about the code snipped you shared. The solution will work but will be extremely expensive. It has a time complexity of O(m*n). Imagine, you have I think it would benefit us all if the design of this repo can be made much better. The dataloader should also return the actual image_id to being with and
|
@Abdul-Mukit You are right. I tried it in my small dataset, but when using it in a big dataset, the efficiency is slow. I will try it and pr 。 Do you have other questions during training? If you complete it, would you talk about how to do it. |
@yjmm10 no I have not trained using this model yet. I got stuck due to this bug. |
I am actually a bit hesitant with the design choices in the repository. For example, the function |
I think that the function of calculate_ap is not so good.
|
Ok, Now I complete the simple validate for the val dataset. The |
Found the problem I think
annotations_index is using the |
We already have available to us the The only thing concerning is that .jpg. According to proposal of using |
First of all, thank you very much for this much-needed initiative.
Describe the bug
It is assumed in this repo that the image file names will always be int convertible. That is not true. If image names in the validation split are not convertible to
int
, validation fails. e.g.instead of
tests/data/images/val/000000151480.jpg
if the name wastests/data/images/val/000000151480_.jpg
andinstances_val.json
was updated accordingly, validation will fail.This is the initial error message:
I am open to making PRs. Will appreciate input on how to solve this.
To Reproduce
Steps to reproduce the behavior:
tests/data/images/val/000000151480.jpg
totests/data/images/val/000000151480_.jpg
.tests/data/annotations/instances_val.json
to change"file_name": "000000151480.jpg",
to"file_name": "000000151480_.jpg"
.python yolo/lazy.py task=train model=v9-s dataset=mock task.epoch=1
.Expected behavior
In COCO datasets'
*/annotations/*.json
files, theimages
are supposed to haveint
id
s andstring
file_name
s. A string can have any character and the logic of this codebase should accommodate for it. Instead of using filename as a id it should use the actualid
of the image as mentioned in the.json
file.Screenshots
The problem is the it was assumed in
"image_id": int(Path(img_path).stem),
that thestem
will always be convertible toint
.Even if the int is removed to make it

"image_id": Path(img_path).stem
. The solution is fundamentally wrong asimage_id
is notfilename
. In theinstances_val.json
that particular image already had an int id assigned to it. Both id (int
) and filename (string
) can be completely random and may not have any similarityInstead of taking the correct id, the file name was used as the id. This was done as the

ModelValidator.solve()
's dataloader doesn't return the actual ids but only theimg_paths
.Thus we run into another assert error:
System Info (please complete the following ## information):
Additional context
My dataset was produced directly from CVAT with
int
image_id and random string for filenames. I have used these datasets before with other DNN libraries, without having this issue before. I am confident that I am not breaking any of COCO's dataset conventions. This "filename is int and same as id", assumption is very limiting. In many industry applications, images are time stamped like<time>_<date>_<location>.jpg
. Moreover, while editing datasets using libraries like datumaro imagesid
s mentioned in .json can certainly change anytime.The text was updated successfully, but these errors were encountered: