How do I subset based on RAW counts? #9459

alexwskh · 2024-11-07T16:46:29Z

Hello,

I am having trouble subsetting a seurat object based on raw counts (expression of transgenes). I simply want to subset based on non-transformed data. I tried this by setting the assay to RNA and then subsetting, but I get an error saying that the data layer does not exist. Is the counts layer from the SCT assay equivalent to the counts layer of RNA? If so, can I just specify slot = "counts" in the subset function?

I think I am losing some cells of interest when subsetting expression based on transformed data, so that is why I am pursuing this.

This is a dataset that is integrated from several scRNAseq experiments.

Thanks for any help.

samuel-marsh · 2024-11-08T20:28:59Z

Hi,

Not member of dev team but hopefully can be helpful. Can you please post the full code you are running and the full text of the error messages?

Thanks,
Sam

rharao · 2024-11-11T16:16:58Z

For subsetting based on non-default assay data, I find that some use of the [ methods is almost unavoidable.

If I had a transgene called Tgx and wanted to retain only the cells with at least two reads for Tgx in the raw RNA counts matrix, I would do something like:

object %<>% (\(o) o[ , o$RNA$counts["Tgx", ] >= 2 ])

Here's an example where I subset pbmc_small to only the cells positive for CTSS:

pbmc_small %>% (\(o) o[ , o$RNA$counts["CTSS", ] > 0 ])

An object of class Seurat 
230 features across 48 samples within 1 assay 
Active assay: RNA (230 features, 20 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, tsne

samuel-marsh · 2024-11-11T16:47:23Z

Hi,

You can do this a couple of ways using the accessor functions provided in Seurat or directly using subset.
All of these examples are from Seurat/SeuratObject documentation.

# using subset
sub_obj <- subset(pbmc_small, subset = MS4A1 > 3, slot = 'counts')

# Using WhichCells then subset
cells_sub <- WhichCells(pbmc_small, expression = MS4A1 > 3, slot = "counts")
sub_obj <- subset(pbmc_small, cells = cells_sub)

# Fetch Data
exp_cells <-  FetchData(object = pbmc_small, vars = 'MS4A1', layer = "counts") %>%
    filter(MS4A1 > 3) %>%
    rownames()

sub_obj <- subset(pbmc_small, cells = exp_cells)

Best,
Sam

rharao · 2024-11-11T17:27:29Z

@alexwskh the short answers:

Yes, you can set the default assay of the object to "RNA", subset with slot = "counts", and then set the default assay back to "SCT". (This subset method has lagged behind regarding deprecation of slot in favor of layer.)
SCTransform counts are back-calculated versions of the input (RNA) counts, corrected for sequencing depth differences among cells, so the counts matrices of the "SCT" and "RNA" assays should be expected to be different.

I suppose an enhancement could be to parse out assay names from the subset or expression arguments in these various functions to enable use cases like the issue author's-- subsetting based on data from another assay than what is the default. Something like subset(object, Tgx_RNA >= 2); in my experience, if this is meant to have been implemented, it is not working.

alexwskh · 2024-11-11T19:51:58Z

Thank you everyone who responded, I really appreciate the help!

@samuel-marsh, let me try to lay out the relevant pieces...
I have five separate scRNAseq experiments that I do some light preprocessing on (%mt really) as individual seurat objects. I then merge them together, run SCT normalization, then CCA integration. It's identical to the introductory pipeline:
https://satijalab.org/seurat/articles/integration_introduction

The quick summary of the seurat object is as such (with SCT as default assay)
An object of class Seurat 49208 features across 41013 samples within 2 assays Active assay: SCT (24099 features, 3000 variable features) 3 layers present: counts, data, scale.data 1 other assay present: RNA 3 dimensional reductions calculated: pca, integrated.dr, umap

I'm trying to subset on raw counts for a transgene because I noticed after SCT normalization and across assays, some cells are normalized to zero transcript counts. If I pick up a single transgenic count (which might be overly lenient sure), I want to keep those cells. So I switch the default assay to RNA and try to subset based on counts.

DefaultAssay(seurthresh2) <- "RNA"
seur.gfpdbh <- subset(x = seurthresh2, subset = eGFP-eGFP > 0 & dbh-ENSDARG00000069446 > 0)

Warning: No layers found matching search pattern provided Error in FetchData.Assay5(object = object[[DefaultAssay(object = object)]], : layer "data" is not found in the object

On other Seurat functions it will default to counts if the data layer is not present, so I wasn't sure how to grab what I wanted.

It looks like both you and @rharao confirmed that I can use slot = "counts" after switching my assay to RNA in order to subset based on raw counts, and then switching back to SCT on the main object. Thanks! To highlight my issue, here is the truncated table of eGFP counts after subsetting gfp > 0 from the RNA assay, counts slot. First from the RNA assay and then from the SCT assay.

RNA Assay, Table() of FetchData for transgenic GFP
`> table(GFP)
eGFP-eGFP

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1186 1116 1014 901 758 585 503 397 292 276 231 177 167 144 147 119 110 103 92 103 103 86 74 86 57 69 50 51 70 62 `

SCT Assay, Table() of FetchData for transgenic GFP
`> table(GFP)
eGFP-eGFP

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1346 1122 719 470 415 404 354 346 349 312 281 243 332 279 260 246 274 246 254 221 231 225 214 204 183 188 197 182 157 147
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
131`

If I subset based on the transformed counts on the SCT assay, I am losing 1346 cells from my analysis as the normalized counts are zeroed for those cells. Thanks again, though please point out if I am misunderstanding anything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I subset based on RAW counts? #9459

How do I subset based on RAW counts? #9459

alexwskh commented Nov 7, 2024

samuel-marsh commented Nov 8, 2024

rharao commented Nov 11, 2024 •

edited

Loading

samuel-marsh commented Nov 11, 2024

rharao commented Nov 11, 2024

alexwskh commented Nov 11, 2024 •

edited

Loading

How do I subset based on RAW counts? #9459

How do I subset based on RAW counts? #9459

Comments

alexwskh commented Nov 7, 2024

samuel-marsh commented Nov 8, 2024

rharao commented Nov 11, 2024 • edited Loading

samuel-marsh commented Nov 11, 2024

rharao commented Nov 11, 2024

alexwskh commented Nov 11, 2024 • edited Loading

rharao commented Nov 11, 2024 •

edited

Loading

alexwskh commented Nov 11, 2024 •

edited

Loading