Refactoring MSD, smoothness and dynamic range calculations and unecessary script cleanup #219

edyoshikun · 2025-01-08T02:11:12Z

This PR refactors and adds the calculations for:

Mean square displacement
Slope calculation for smoothness
~~Normalization strategies for the embeddings prior analysis~~ Decided to use raw
Cleaning up unecessary/obsolete scripts

viscy/representation/evaluation/distance.py

…lotting

…tation.evaluation.distance and clustering files

* caching dataloader * caching data module * black * ruff * Bump torch to 2.4.1 (#174) * update torch >2.4.1 * black * ruff * adding timeout to ram_dataloader * bandaid to cached dataloader * fixing the dataloader using torch collate_fn * replacing dictionary with single array * loading prior to epoch 0 * Revert "replacing dictionary with single array" This reverts commit 8c13f49. * using multiprocessing manager * add sharded distributed sampler * add example script for ddp caching * format and lint * addding the custom distrb sampler to hcs_ram.py * adding sampler to val train dataloader * fix divisibility of the last shard * hcs_ram format and lint * data module that only crops and does not collate * wip: execute transforms on the GPU * path for if not ddp * fix randomness in inversion transform * add option to pop the normalization metadata * move gpu transform definition back to data module * add tiled crop transform for validation * add stack channel transform for gpu augmentation * fix typing * collate before sending to gpu * inherit gpu transforms for livecell dataset * update fcmae engine to apply per-dataset augmentations * format and lint hcs_ram * fix abc type hint * update docstring style * disable grad for validation transforms * improve sample image logging in fcmae * fix dataset length when batch size is larger than the dataset * fix docstring * add option to disable normalization metadata * inherit gpu transform for ctmc * remove duplicate method overrride * update docstring for ctmc * allow skipping caching for large datasets * make the fcmae module compatible with image translation * remove prototype implementation * fix import path * Arbitrary prediction time transforms (#209) * fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors * add docstrings * fix typo in docstring --------- Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>

* fix terminology * fix task description VisCy is not setup to do supervised image classification

…module (#225) * fixing Triplet Dataset always return the negative sample #224 * Update viscy/data/triplet.py Co-authored-by: Ziwen Liu <[email protected]> * adding comment to tripletdataset --------- Co-authored-by: Ziwen Liu <[email protected]>

edyoshikun

@Soorya19Pradeep this is a good start. We should focus the tests to ensure that the functions handle a known input and behave as expected. By this I mean, if we throw in a known array with known features like you did using random.uniform() or similar, when we run individual functions in the class like CellFeatures().compute_intensity_features() do we get the expected mean,std,min,max,kurtosis,etc? We can use pytest.approx(to the expected value). Most of these test, if you look at the tests/data/test_data.py are checking that we return the right shapes and numbers because that is crucial for feeding them into the model. In this case, we are more interested in the functionality of the function and its behavior.

I left individual comments, and happy to chat about each individual one.

edyoshikun · 2025-02-25T16:54:06Z