Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

racing condition when InvertD is used along with ThreadDataLoader #8056

Open
yiheng-wang-nv opened this issue Aug 30, 2024 · 2 comments
Open
Assignees

Comments

@yiheng-wang-nv
Copy link
Contributor

Describe the bug
Error happens when the preprocesssing/inversing workload is large and thread dataloader is used.

The issue is found by @wyli

@yiheng-wang-nv
Copy link
Contributor Author

Hi @ericspod ,
When preparing the fast inference tutorial (Project-MONAI/tutorials#1948), I met the similar issue.
Reproduce the error is simple:
Using spleen bundle (https://github.com/Project-MONAI/model-zoo/tree/dev/models/spleen_ct_segmentation), change the dataloader to:

    "dataloader": {
        "_target_": "ThreadDataLoader",
        "dataset": "@dataset",
        "batch_size": 1,
        "shuffle": false,
        "num_workers": 0
    },

Then, when run inference, will see errors like:

2025-02-28 23:09:00,696 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs
2025-02-28 23:09:01,400 - py.warnings - WARNING - /home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/engines/evaluator.py:332: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(**engine.amp_kwargs):

2025-02-28 23:09:02,124 INFO image_writer.py:197 - writing: eval/spleen_1/spleen_1_trans.nii.gz
2025-02-28 23:09:03,769 INFO image_writer.py:197 - writing: eval/spleen_11/spleen_11_trans.nii.gz
2025-02-28 23:09:05,183 INFO image_writer.py:197 - writing: eval/spleen_15/spleen_15_trans.nii.gz
2025-02-28 23:09:06,848 INFO image_writer.py:197 - writing: eval/spleen_23/spleen_23_trans.nii.gz
2025-02-28 23:09:08,742 INFO image_writer.py:197 - writing: eval/spleen_30/spleen_30_trans.nii.gz
2025-02-28 23:09:09,833 INFO image_writer.py:197 - writing: eval/spleen_34/spleen_34_trans.nii.gz
2025-02-28 23:09:10,515 - ignite.engine.engine.SupervisedEvaluator - ERROR - Current run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x7fc170c26350>
2025-02-28 23:09:10,515 - ERROR - Exception: applying transform <monai.transforms.compose.Compose object at 0x7fc170c26350>
Traceback (most recent call last):
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/transform.py", line 150, in apply_transform
    return _apply_transform(transform, data, unpack_items, lazy, overrides, log_stats)
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/transform.py", line 98, in _apply_transform
    return transform(data, lazy=lazy) if isinstance(transform, LazyTrait) else transform(data)
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/spatial/dictionary.py", line 530, in inverse
    d[key] = self.spacing_transform.inverse(cast(torch.Tensor, d[key]))
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 546, in inverse
    return self.sp_resample.inverse(data)
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/spatial/array.py", line 239, in inverse
    transform = self.pop_transform(data)
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/inverse.py", line 338, in pop_transform
    return self.get_most_recent_transform(data, key, check, pop=True)
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/inverse.py", line 320, in get_most_recent_transform
    self.check_transforms_match(all_transforms[-1])
  File "/home/venn/Desktop/monai-code/kvikio_env/lib/python3.10/site-packages/monai/transforms/inverse.py", line 287, in check_transforms_match
    raise RuntimeError(
RuntimeError: Error SpatialResample getting the most recently applied invertible transform Orientation 140468797200880 != 140468797202368.

Do you have any ideas on the issue?

@ericspod ericspod self-assigned this Mar 3, 2025
@ericspod
Copy link
Member

ericspod commented Mar 3, 2025

Hi @yiheng-wang-nv this may be an old one that we've seen come up on occassion. I thought this was related to the fact that transforms get cloned when using threading in some conditions and so the ID values don't match even though things would work. I don't think this is a race condition per se, just that you wouldn't see it across processes since the cloning doesn't need to happen then. I'll investigate further here, I thought things were addressed or there was a work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants