Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fixes image_id calculation when using COCO dataset and images contain non-int convertible file names. #79

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Abdul-Mukit
Copy link
Contributor

@Abdul-Mukit Abdul-Mukit commented Aug 19, 2024

Description

When using COCO dataset (.json files) the image_id is incorrectly derived from the image file name in multiple parts of the code base. It is assumed that once image file names are converted to int, they will be equal to the image_id stored in the .json files. This is incorrect. As a result, in practical datasets where image names contain non-int characters, the calculate_ap function fails. That is because the derived string image_id and the actual int image_id as defined in the .json files don't match anymore. This PR addresses this issue by making the following necessary changes.
Closes: #67, #36

Changes:

  • Accurate and consistent usage of image_id (as defined in COCO .json file) throughout code base.
  • YoloDataset.__getitem__() now returns image_id, instead of image_path. Context: Only the actual int id is needed when using the COCO formatted data set and a call to calculate_ap is made.
  • For COCO, YoloDataset.data now contains int image_id along with image path and labels data. Previously self.data didn't include image_id information.
  • For YOLO .txt datasets, YoloDataset.data contains string image_id. image_id in this case is image file name without extension as before.
  • dataset_utils.create_image_metadata() now returns three dicts instead of two. Returns three dictionaries mapping image id to list of annotations, image id to image information, and image name to image id. Image id is the int id assigned to an image in the COCO formatted .json file. Context: This enables filter_data to handle image_id of COCO datasets accurately.
  • Docstrings of all touched functions updated.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

Checklist:

  • The code follows the Python style guide.
  • Code and files are well organized.
  • All tests pass.
  • New code is covered by tests.
  • The pull request is directed to the corresponding topic branch.

Licensing:

By submitting this pull request, I confirm that:

  • My contribution is made under the MIT License.
  • I have not included any code from questionable or non-compliant sources (GPL, AGPL, ... etc).
  • I understand that all contributions to this repository must comply with the MIT License, and I promise that my contributions do not violate this license.
  • I have not used any code or content from sources that conflict with the MIT License or are otherwise legally questionable.

Additional Information

@henrytsui000 please help me with tests. In the tests/data, by only changing one of the image file names to include a "_" should cover test for the MR. E.g. 000000151480.jpg -> 000000151480_.jpg in both image file name and the instances_val.json.

I intend to work on this repo quite frequently. Would really appreciate your help for quick collaboration.
cc: @WongKinYiu

…_dict and image_info_dict with image file name as key to ensure uniform key accross the code base.

refactor: dataset_utils.map_annotations_to_image_names returns annotations list mapped to image file names instead of image_id.

refactor: several variable names made more descriptive.

docs: docstrings updated.
…stead of image_path as the key.

refactor: annotations_index renamed to annotations_dict.
@Abdul-Mukit Abdul-Mukit changed the title 67 fix image id usage consistency [FIX] image_id calculation when using COCO dataset and images contain non-int convertible file names. Aug 19, 2024
@Abdul-Mukit Abdul-Mukit changed the title [FIX] image_id calculation when using COCO dataset and images contain non-int convertible file names. [FIX] Fixes image_id calculation when using COCO dataset and images contain non-int convertible file names. Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validation fails when image names are not int convertible.
1 participant