-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anndata Object with counts? #23
Comments
Hi @Demond-dev , you can try this function I adapted from the bin2cell read_visium function. It tracks back the changes made to the pixel coordinates within enact so that the spots fit to the high resolution image All you need is the object_path (Path to the enact adata.h5 output)
|
This was used on the output of the human colon cancer visium hd dataset, and I was able to perform standard downstream processing using scanpy/squidpy. One issue I am having is that if I try to save to raw beforehand (as below), my kernel crashes when running any downstream analysis (e.g. filter_cells).
Would anyone have a solution for this please? I don't encounter any such issues with the bin2cell output |
@Demond-dev in our latest release we have added a notebook example that may help you visualize the outputs in an image: https://github.com/Sanofi-Public/enact-pipeline/blob/main/ENACT_outputs_demo.ipynb (note that in git it does not render the image as it is an html interactive plot, but once you load it in jupyter or vscode it should be visible). Also, the latest release incorporates compatibility with TIssUUmaps for output visualization. The execution generates a .tmap file that allows you to load your outcomes in TissUUmaps: https://github.com/Sanofi-Public/enact-pipeline/blob/main/ENACT_outputs_demo.ipynb |
Have you checked your memory usage? it seems you may be running out of memory. |
@AlbertPlaPlanas It does seem to be linked to memory usage, but I am not sure why because the standard 8um binned object or a bin2cell object both run without any memory issues. Is there something different to the structure of the enact output object which would make it require more memory? Thanks for your help |
My first thought is that if you keep your ENACT object in memory after analysis, it may be eating a lot of your memory as it will have all the raw counts, the bin-cell assignment, the whole slide image, and a few other heavy elements stored in memory. So, assuming you are doing something like this
could you try doing something like this instead?
It may not be the most elegant solution, but let us know if it works so we can better manage this in future releases. |
I have been running the downstream analysis after finishing the run_enact script, closing this and loading a fresh environment with free memory, so I don't think this is the problem. I can have a try at trimming final anndata object from this "read_visium_enact" function I posted above, to see if that reduces the size. I also noticed that the structure of the adata.X from the enact output was different compared to a 8um bin or bin2cell object, which could be affecting the processing time (from the 10x Visium Human Colon Cancer dataset).
<Compressed Sparse Row sparse matrix of dtype 'float32'
<Compressed Sparse Row sparse matrix of dtype 'float64'
array([[0, 0, 0, ..., 0, 0, 0], |
Thanks, this helps. Just one question before we revise this, what is your current setup? |
I have 36GB RAM on my local computer, with access to higher remote computing resources if needed. Thanks again |
Thank you for pointing this issue out. We will updated ENACT's 'df_to_adata()' function to save the transcript counts as a sparse matrix. In the mean time, update df_to_adata() in 'enact-pipeline/src/package_results.py/' to the following:
ENACT's next release will include this fix. |
The reason why you are only seeing 1000 variables because you likely have the 'use_hvg' flag set to true which only considers the top-1000 highly variable genes for bin-to-cell assignment. If you want the entire genes to be present in the final AnnData object, please set this flag to False. However, note that setting 'use_hvg' to True will cause bin-to-cell-assignment methods such as 'weighted_by_cluster' and 'weighted_by_gene' to take significantly longer time and require more memory and storage on your machine. |
This is now fixed in ENACT version 0.2.2. |
Hello,
where merged_df would be a df of the merged files from the idx_lookup folder. Thanks! |
Hi,
I was wondering if there was anyway to incorporate the output from the ENACT pipeline into an anndata object with the binned counts and the whole slide image for use in spatial plots with other packages? I've tried running the pipeline on data our lab has generated and it only resulted in an anndata object with 1000 variables and no image to use for spatial plotting.
Thanks!
The text was updated successfully, but these errors were encountered: