-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring MSD, smoothness and dynamic range calculations and unecessary script cleanup #219
base: main
Are you sure you want to change the base?
Conversation
…tation.evaluation.distance and clustering files
* caching dataloader * caching data module * black * ruff * Bump torch to 2.4.1 (#174) * update torch >2.4.1 * black * ruff * adding timeout to ram_dataloader * bandaid to cached dataloader * fixing the dataloader using torch collate_fn * replacing dictionary with single array * loading prior to epoch 0 * Revert "replacing dictionary with single array" This reverts commit 8c13f49. * using multiprocessing manager * add sharded distributed sampler * add example script for ddp caching * format and lint * addding the custom distrb sampler to hcs_ram.py * adding sampler to val train dataloader * fix divisibility of the last shard * hcs_ram format and lint * data module that only crops and does not collate * wip: execute transforms on the GPU * path for if not ddp * fix randomness in inversion transform * add option to pop the normalization metadata * move gpu transform definition back to data module * add tiled crop transform for validation * add stack channel transform for gpu augmentation * fix typing * collate before sending to gpu * inherit gpu transforms for livecell dataset * update fcmae engine to apply per-dataset augmentations * format and lint hcs_ram * fix abc type hint * update docstring style * disable grad for validation transforms * improve sample image logging in fcmae * fix dataset length when batch size is larger than the dataset * fix docstring * add option to disable normalization metadata * inherit gpu transform for ctmc * remove duplicate method overrride * update docstring for ctmc * allow skipping caching for large datasets * make the fcmae module compatible with image translation * remove prototype implementation * fix import path * Arbitrary prediction time transforms (#209) * fix spelling in docstring and comment * add batched zoom transform for tta * add standalone lightning module for arbitrary TTA * fix composition of different zoom factors * add docstrings * fix typo in docstring --------- Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>
* fix terminology * fix task description VisCy is not setup to do supervised image classification
…module (#225) * fixing Triplet Dataset always return the negative sample #224 * Update viscy/data/triplet.py Co-authored-by: Ziwen Liu <[email protected]> * adding comment to tripletdataset --------- Co-authored-by: Ziwen Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Soorya19Pradeep this is a good start. We should focus the tests to ensure that the functions handle a known input and behave as expected. By this I mean, if we throw in a known array with known features like you did using random.uniform()
or similar, when we run individual functions in the class like CellFeatures().compute_intensity_features()
do we get the expected mean,std,min,max,kurtosis,etc? We can use pytest.approx(to the expected value)
. Most of these test, if you look at the tests/data/test_data.py
are checking that we return the right shapes and numbers because that is crucial for feeding them into the model. In this case, we are more interested in the functionality of the function and its behavior.
I left individual comments, and happy to chat about each individual one.
""" | ||
Create an image with a small range of values | ||
""" | ||
return np.random.uniform(min_val, max_val, size).astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are using random. I suggest we add a seed. Any number is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong but since this i using random vs randint
the type is already a float.
""" | ||
Create an image with a large range of values | ||
""" | ||
return np.random.uniform(min_val, max_val, size).astype(np.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to use randint
instead of random
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or random_integers
in this case since you do have a min,max value
|
||
import numpy as np | ||
from viscy.representation.evaluation.feature import CellFeatures | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.fixture | |
def random_seed(self): | |
np.random.seed(42) |
assert np.isnan(cell_features[feature].values[0]), f"{feature} should be nan for constant image" | ||
|
||
for feature in positive_features: | ||
assert abs(cell_features[feature].values[0]) > 0, f"{feature} should be positive for constant image" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one needs to be more precise. When you do abs()
it will always return a positive number.
float_features = list(set(positive_features) - set(integer_features)) | ||
|
||
for feature in positive_features: | ||
assert abs(cell_features[feature].values[0]) >= 0, f"{feature} should be positive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one as well. change since abs()
returns positive values.
binary_image = self.create_binary_image() | ||
|
||
features = CellFeatures(constant_image, binary_image) | ||
cell_features = features.compute_all_features() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a nice strucure in the code where you can do CellFeatures().compute_intensity_features()
I suggest we split and do one test function for each case. That way the tests are more focused, easier to debug and we can isolate the issues. In principle if we call the individual compute_***()
we should be able to call compute_all()
since this calls all of these in sequence.
CF = CellFeatures(image,mask)
features = CF.compute_intensity_features()
assert features["mean_intensity"] == value
assert features["std_dev"] == 0
assert features["min_intensity"] == value
assert features["max_intensity"] == value
I haven't written many tests myself with pytests, but we can also use either hypothesis
or pytest.mark.parametrize
. The latter I've used to make arrays with known input variables so we can re-use the functions like this
@pytest.mark.parametrize("value", [0, 1, 255, -100, 1e5])
def test_constant_image_intensity_features(self, value, binary_mask):
"""Test intensity features with constant value images"""
image = np.full((100, 100), value, dtype=np.float32)
cf = CellFeatures(image, binary_mask)
features = cf._compute_intensity_features()
assert features["mean_intensity"] == value
assert features["std_dev"] == 0
assert features["min_intensity"] == value
assert features["max_intensity"] == value
assert features["kurtosis"] == 0 # Constant distribution has no kurtosis
assert features["skewness"] == 0 # Constant distribution has no skewness
This PR refactors and adds the calculations for:
Normalization strategies for the embeddings prior analysisDecided to use raw