[FIX] Fixes image_id calculation when using COCO dataset and images contain non-int convertible file names. #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When using COCO dataset (.json files) the
image_id
is incorrectly derived from the image file name in multiple parts of the code base. It is assumed that once image file names are converted toint
, they will be equal to theimage_id
stored in the .json files. This is incorrect. As a result, in practical datasets where image names contain non-int characters, thecalculate_ap
function fails. That is because the derived stringimage_id
and the actual intimage_id
as defined in the .json files don't match anymore. This PR addresses this issue by making the following necessary changes.Closes: #67, #36
Changes:
image_id
(as defined in COCO .json file) throughout code base.YoloDataset.__getitem__()
now returnsimage_id
, instead ofimage_path
. Context: Only the actualint
id is needed when using the COCO formatted data set and a call tocalculate_ap
is made.YoloDataset.data
now containsint
image_id
along with image path and labels data. Previouslyself.data
didn't includeimage_id
information..txt
datasets,YoloDataset.data
contains stringimage_id
.image_id
in this case is image file name without extension as before.dataset_utils.create_image_metadata()
now returns three dicts instead of two. Returns three dictionaries mapping image id to list of annotations, image id to image information, and image name to image id. Image id is theint
id
assigned to an image in the COCO formatted .json file. Context: This enablesfilter_data
to handleimage_id
of COCO datasets accurately.Type of Change
Checklist:
Licensing:
By submitting this pull request, I confirm that:
Additional Information
@henrytsui000 please help me with tests. In the
tests/data
, by only changing one of the image file names to include a "_" should cover test for the MR. E.g.000000151480.jpg
->000000151480_.jpg
in both image file name and theinstances_val.json
.I intend to work on this repo quite frequently. Would really appreciate your help for quick collaboration.
cc: @WongKinYiu