-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Evaluation, Reproducibility, Benchmarks Meeting 18
AReinke edited this page Jul 27, 2022
·
2 revisions
Date: 27th July 2022
Present: Carole, Annika
- Carole has implemented all metrics from the Metrics Reloaded Framework (as was in June 2022; we will send her the final list of metrics once finalized)
- Open questions:
- Assume an Object Detection problem, for which the algorithm perfectly predicted the location of the reference object but assigned it to the wrong class. This would be partially correct. If validated per class, this may be penalized as “too heavy”.
- We should separate these cases and define this as a new biomedical question in which you only define foreground (all classes together) versus background and make a new metric selection (traversal)
- In the original question (localization+categorization), it is correct that these cases are considered as errors
- How to deal with different NaN cases in the aggregation? This should be done case by case, including
- Empty image, empty prediction => correct
- Empty image, non-empty prediction => incorrect
- Non-empty image, empty prediction => incorrect
- Missing submission of image (e.g. in challenges) => incorrect
- Carole will send the code to Annika and others (probably group leads)
- The code will probably be ready until the submission of the paper
- Assume an Object Detection problem, for which the algorithm perfectly predicted the location of the reference object but assigned it to the wrong class. This would be partially correct. If validated per class, this may be penalized as “too heavy”.
- MS Lesion Segmentation:
- This task is typically phrased as Semantic Segmentation
- Clinically, the lesion count is important
- Lesions are of high variability in sizes (some are only very few pixels, some are multiple thousands)
- It is important to not miss the tiny lesions
- The task should rather being phrased as Object Detection or (better) Instance Segmentation
- We will add this example as “recommendations beyond common practice” in a later version