Skip to content
Andrey Fedorov edited this page May 30, 2024 · 10 revisions

National Cancer Institute Imaging Data Commons (IDC) as a resource to support transparency, reproducibility, and scalability in imaging AI


Short link to this page: https://tinyurl.com/isbi24-idc


International Symposium for Biomedical Imaging (ISBI) 2024

May 30, 2024 @ 13:00-14:30 Athens time, MC3.4 Demo session: 15:30-16:30

image

Presenters

  • Andrey Fedorov, PhD - Brigham and Women's Hospital / Harvard Medical School, Boston, USA
  • Daniela Schacherer, MS - Fraunhofer MEVIS, Bremen, Germany

Pre-course instructions

This session will include a mix of slide presentations, live demonstrations, and Python notebooks using Google Colaboratory (Google Colab).

For the best educational experience, you must bring your own laptop computer, since the components that will be demonstrated are not optimized for tablet or smartphone devices. If you choose to experiment with the Python notebooks, you will also need a Google account (even if temporary) to access Colab.

About IDC

NCI Imaging Data Commons (IDC) is a cloud-based environment containing publicly available cancer imaging data co-located with analysis and exploration tools and resources. IDC is a node within the broader NCI Cancer Research Data Commons (CRDC) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data.

If this is the first time you hear about IDC, here are some highlights about what it has to offer:

  • >65 TB: IDC contains radiology, brightfield (H&E) and fluorescence slide microscopy images, along with image-derived data (annotations, segmentations, quantitative measurements) and accompanying clinical data
  • free: all of the data in IDC is publicly available: no registration, no access requests
  • commercial-friendly: >95% of the data in IDC is covered by the permissive CC-BY license, which allows commercial reuse (small subset of data is covered by the CC-NC license); each file in IDC is tagged with the license to make it easier for you to understand and follow the rules
  • cloud-based: all of the data in IDC is available from both Google and AWS public buckets: fast and free to download, no out-of-cloud egress fees
  • harmonized: all of the images and image-derived data in IDC is harmonized into standard DICOM representation

Tutorial program

  1. Introduction to IDC (slides) (slides will be shared publicly following the tutorial)

  2. Hands-on demonstration of the IDC Portal

  3. Hands-on demonstration of the 3D Slicer IDC Browser extension

  4. Hands-on: Basics of interacting with IDC from Python (notebook)

  5. Hands-on (optional): Experimenting with MedSAM on real medical images (notebook)

  6. Introduction to working with slide microscopy data in IDC (slides) (slides will be shared publicly following the tutorial)

  7. Hands-on: Experimenting with AI inference on slide microscopy data in IDC (notebook)

Other relevant materials to explore on your own

  • IDC User Forum is the place to ask any questions related to IDC, get help from IDC developers, and meet other IDC users!
  • Advanced topics tutorials: here you can learn how to search and combine clinical data with imaging metadata
  • Viewers deployment tutorials: learn how to host your own instances of OHIF and Slim viewers and use them to visualize your own private data in Google Cloud
  • IDC Zenodo community: direct submissions of data to IDC are deposited to Zenodo to archive, provide citation, and generate the DOI to provide visibility and credit to the submitters
  • IDC publications: curated list of peer-reviewed manuscripts and preprints by the IDC team and external groups that utilized IDC in their research
  • MHub: a platform for Deep Learning models in medical imaging curated into a standardized and easy-to-use format. MHub-curated models are ready to be applied to the data in IDC in its native DICOM representation, and are accompanied by publicly available sample images.

Acknowledgments and disclosures

  • This project has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l.
  • Free cloud hosting and out of cloud egress of IDC data are supported through the partnership with the respective public datasets programs of Google and Amazon Web Services.
  • We are grateful to the Advanced Cyberinfrastructure Coordination Ecosystem (ACCESS) for the JetStream2 credits allocation.
  • The views presented do not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

If you use IDC in your research, please give credits to IDC by citing the following publication:

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). doi:10.1148/rg.230180