Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I subset based on RAW counts? #9459

Open
alexwskh opened this issue Nov 7, 2024 · 5 comments
Open

How do I subset based on RAW counts? #9459

alexwskh opened this issue Nov 7, 2024 · 5 comments

Comments

@alexwskh
Copy link

alexwskh commented Nov 7, 2024

Hello,

I am having trouble subsetting a seurat object based on raw counts (expression of transgenes). I simply want to subset based on non-transformed data. I tried this by setting the assay to RNA and then subsetting, but I get an error saying that the data layer does not exist. Is the counts layer from the SCT assay equivalent to the counts layer of RNA? If so, can I just specify slot = "counts" in the subset function?

I think I am losing some cells of interest when subsetting expression based on transformed data, so that is why I am pursuing this.

This is a dataset that is integrated from several scRNAseq experiments.

Thanks for any help.

@samuel-marsh
Copy link
Collaborator

Hi,

Not member of dev team but hopefully can be helpful. Can you please post the full code you are running and the full text of the error messages?

Thanks,
Sam

@rharao
Copy link

rharao commented Nov 11, 2024

For subsetting based on non-default assay data, I find that some use of the [ methods is almost unavoidable.

If I had a transgene called Tgx and wanted to retain only the cells with at least two reads for Tgx in the raw RNA counts matrix, I would do something like:

object %<>% (\(o) o[ , o$RNA$counts["Tgx", ] >= 2 ])

Here's an example where I subset pbmc_small to only the cells positive for CTSS:

pbmc_small %>% (\(o) o[ , o$RNA$counts["CTSS", ] > 0 ])
An object of class Seurat 
230 features across 48 samples within 1 assay 
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, tsne

@samuel-marsh
Copy link
Collaborator

Hi,

You can do this a couple of ways using the accessor functions provided in Seurat or directly using subset.
All of these examples are from Seurat/SeuratObject documentation.

# using subset
sub_obj <- subset(pbmc_small, subset = MS4A1 > 3, slot = 'counts')

# Using WhichCells then subset
cells_sub <- WhichCells(pbmc_small, expression = MS4A1 > 3, slot = "counts")
sub_obj <- subset(pbmc_small, cells = cells_sub)

# Fetch Data
exp_cells <-  FetchData(object = pbmc_small, vars = 'MS4A1', layer = "counts") %>%
    filter(MS4A1 > 3) %>%
    rownames()

sub_obj <- subset(pbmc_small, cells = exp_cells)

Best,
Sam

@rharao
Copy link

rharao commented Nov 11, 2024

@alexwskh the short answers:

  • Yes, you can set the default assay of the object to "RNA", subset with slot = "counts", and then set the default assay back to "SCT". (This subset method has lagged behind regarding deprecation of slot in favor of layer.)
  • SCTransform counts are back-calculated versions of the input (RNA) counts, corrected for sequencing depth differences among cells, so the counts matrices of the "SCT" and "RNA" assays should be expected to be different.

I suppose an enhancement could be to parse out assay names from the subset or expression arguments in these various functions to enable use cases like the issue author's-- subsetting based on data from another assay than what is the default. Something like subset(object, Tgx_RNA >= 2); in my experience, if this is meant to have been implemented, it is not working.

@alexwskh
Copy link
Author

alexwskh commented Nov 11, 2024

Thank you everyone who responded, I really appreciate the help!

@samuel-marsh, let me try to lay out the relevant pieces...
I have five separate scRNAseq experiments that I do some light preprocessing on (%mt really) as individual seurat objects. I then merge them together, run SCT normalization, then CCA integration. It's identical to the introductory pipeline:
https://satijalab.org/seurat/articles/integration_introduction

The quick summary of the seurat object is as such (with SCT as default assay)
An object of class Seurat 49208 features across 41013 samples within 2 assays Active assay: SCT (24099 features, 3000 variable features) 3 layers present: counts, data, scale.data 1 other assay present: RNA 3 dimensional reductions calculated: pca, integrated.dr, umap

I'm trying to subset on raw counts for a transgene because I noticed after SCT normalization and across assays, some cells are normalized to zero transcript counts. If I pick up a single transgenic count (which might be overly lenient sure), I want to keep those cells. So I switch the default assay to RNA and try to subset based on counts.

DefaultAssay(seurthresh2) <- "RNA"
seur.gfpdbh <- subset(x = seurthresh2, subset = eGFP-eGFP > 0 & dbh-ENSDARG00000069446 > 0)

Warning: No layers found matching search pattern provided Error in FetchData.Assay5(object = object[[DefaultAssay(object = object)]], : layer "data" is not found in the object

On other Seurat functions it will default to counts if the data layer is not present, so I wasn't sure how to grab what I wanted.

It looks like both you and @rharao confirmed that I can use slot = "counts" after switching my assay to RNA in order to subset based on raw counts, and then switching back to SCT on the main object. Thanks! To highlight my issue, here is the truncated table of eGFP counts after subsetting gfp > 0 from the RNA assay, counts slot. First from the RNA assay and then from the SCT assay.

RNA Assay, Table() of FetchData for transgenic GFP
`> table(GFP)
eGFP-eGFP

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1186 1116 1014 901 758 585 503 397 292 276 231 177 167 144 147 119 110 103 92 103 103 86 74 86 57 69 50 51 70 62 `

SCT Assay, Table() of FetchData for transgenic GFP
`> table(GFP)
eGFP-eGFP

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1346 1122 719 470 415 404 354 346 349 312 281 243 332 279 260 246 274 246 254 221 231 225 214 204 183 188 197 182 157 147
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
131`

If I subset based on the transformed counts on the SCT assay, I am losing 1346 cells from my analysis as the normalized counts are zeroed for those cells. Thanks again, though please point out if I am misunderstanding anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants