No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

ellango85 · 2024-08-21T13:41:46Z

Hi,
I have a Target RNA Panel Sequence (Set of Genes are enriched) data. I have tried the OUTRIDER with default parameter, but I couldn't able get any aberrant expression even in known positive samples, which has known Aberrant splicing event.

Does the OUTRIDER will work on the Targeted Panel RNA Sequence Data or it will only work on non-Targeted whole transcriptome data!.

Even I tried to merge the Target and non-target Count file and failed due to some error [Run error #31669 unevaluated and other errors !!!]
Kindly help me out to fix this issue.

Best,
Ellango

gelowin · 2024-10-08T14:25:48Z

Hi,
I have the same problem. I create a toy example and no aberrant genes was discovered.

counts <- data.frame(
    id1 = withr::with_seed(seed = 1, rnbinom(100, size = 200, mu = 400)),
    id2 = withr::with_seed(seed = 2, rnbinom(100, size = 20,  mu = 800)),
    id3 = withr::with_seed(seed = 3, rnbinom(100, size = 2000, mu = 100)))
## Simulate some aberrant observations for demonstration purposes
counts[2:3, ] <- 4000000
counts[4,1] <- 888888888

row.names(counts) <- paste0("gene", 1:100)
counts <- as.matrix(counts)
counts <- round(counts) ## counts must be integer
annot_db <- data.frame(sampleID = paste0("id", 1:3), sex = c("F", "M", "M"))
ods <- OUTRIDER::OutriderDataSet(countData = counts, colData = annot_db)
head(assay(ods))


ods <- OUTRIDER::OUTRIDER(ods, implementation = "pca")
outrider_output <- OUTRIDER::results(ods)
## No significant events:

AtaJadidAhari · 2024-10-08T14:46:43Z

Hi @gelowin, running OUTRIDER with such a small set of simulated data will not work. It is recommended to run OUTRIDER with at least 50 samples. With so few samples, you would not get any significant results after FDR correction.

AtaJadidAhari · 2024-10-08T14:54:58Z

Hi @ellango85, sorry for the late reply. How many samples do you have?

You can also try to check the expression values for some genes that you think might have outliers using:

plotExpressionRank(ods, "your_gene_of_interest", basePlot=TRUE)

to see what's going on.

gelowin · 2024-10-09T08:25:41Z

@AtaJadidAhari Thank you very much for your response.

Perhaps I don't fully understand what an "aberrant" observation is. I initially thought it referred to a extreme outlier.

For instance, I can simulate 10k genes and 100 subjects, introduce two clear outliers, and still get results indicating "No significant events", for example:

counts <- replicate( n = 100, rnbinom(10000, size = 200, mu = 50))
colnames(counts) <- paste0("id", 1:100)
rownames(counts) <- paste0("gen", 1:10000)
## introducing two aberrant observations:
counts[2, 2] <- 999999999
counts[1, 3] <- 999999999

annot_db <- data.frame(sampleID = paste0("id", 1:100), sex = c("F", "M"))
ods <- OUTRIDER::OutriderDataSet(countData = counts, colData = annot_db)
ods <- OUTRIDER::OUTRIDER(ods, implementation = "pca")
outrider_output <- OUTRIDER::results(ods)

plotExpectedVsObservedCounts(ods, "gen2", basePlot = TRUE)
plotQQ(ods, "gen2")
plotExpressionRank(ods, "gen2", basePlot=TRUE)

Could you kindly clarify what is meant by 'aberrant status' in this context? I initially thought it referred to the outlier status of an event, but I believe I may have misunderstood.

Thank you very much

AtaJadidAhari · 2024-10-09T08:50:15Z

Hi @gelowin, the problem with this simulation is that the "extreme" outliers are simply too extreme and OUTRIDER cannot fit an autoencoder properly to this data. I redid your analysis with less extreme values, e.g. 9999 instead of 999999999 and then OUTRIDER produced the expected results:

> outrider_output
Key: <geneID, sampleID>
   geneID sampleID       pValue      padjust zScore  l2fc rawcounts meanRawcounts normcounts meanCorrected theta aberrant
   <char>   <char>        <num>        <num>  <num> <num>     <int>         <num>      <num>         <num> <num>   <lgcl>
1:   gen1      id3 2.099035e-22 2.054453e-17   8.49  5.84      9999        150.72    3135.68         83.81  1.24     TRUE
2:   gen2      id2 5.766593e-21 5.644114e-16   9.36  7.36      9999        147.59    8523.82        133.06  0.63     TRUE
   AberrantBySample AberrantByGene padj_rank
              <num>          <num>     <num>
1:                1              1         1
2:                1              1         1

gelowin · 2024-10-09T09:08:55Z

Ok, thank you very much.
Yes, with 9999, works. Anyway it seems that it should work also for those more aberrant. I did not try the autoencoder, because it seems freeze all the time but with the "pca" is not detected.
Thank you very much for your response and for your package.

vyepez88 · 2024-10-12T09:32:02Z

Hi @gelowin, I guess the problem is that 999999999 is so high that it's unrealistic and beyond the sort of expected values for the negative binomial distribution. Not sure if with the autoencoder it "freezes" or it's actually working. It can take a couple of hours.

qj009 · 2025-02-17T16:37:42Z

Hello, I have a similar problem.

I am running OUTRIDER(version: 1.24.0) to identify abnormal expressed gene in my data. However, it cannot identify the high expression genes we have. The numbers are as listed below:

The adjusted P value is 1 and P value is way larger than 0.05.
The total number of genes in the test is around 500 with several read count filters I tried:
mean read count larger than 50 reads
at least one sample with 10 reads
all samples with 10 reads

The running script is as below:

## at least one sample with x reads` 
samplef <- sample %>% filter_all(any_vars(.>=x)) 
# or 
# mean read count larger than x reads 
samplef <- sample[rowMeans(sample)>=x,] 
# or 
# all samples with x or more support read counts 
samplef <- sample %>% filter_all(all_vars(.>=x)) 
# generate OUTRIDER data set 
ods <- OutriderDataSet(countData= samplef) 
# implementation = “pca” or “autoencoder” 
ods <- OUTRIDER(ods, implementation=implementation, BPPARAM=MulticoreParam(20)) 
# keep all results to look up result of the high expressed gene (positive control) 
res <- results(ods, all=TRUE)

With the filter : mean read count larger than 50, around 500 genes remain but adjusted P values are 1 for the known high expression gene in the positive sample using pca or autoencoder implementation method. Even the p values are larger than 0.1. There are over 100 samples included in the analysis. The result is similar using other read count filters listed above.

Any suggestions are appreciated. Thank you!

vyepez88 · 2025-02-17T19:54:45Z

Hi, how high is the high expression gene? What are the normalized counts, fold change and p-value?
you can also do the expression plot: plotExpressionRank(ods, 'sample id', basePlot = TRUE)

qj009 · 2025-02-20T20:07:56Z

Hello @vyepez88 , thank you for your response!

The DESeq2 result: baseMean(The average of the normalized count values) is 16248.09, log2FC = 3.27, pvalue = 0.00012, padj = 0.007.

Below are the plots generated by: plotExpressionRank(ods, 'BCL2', basePlot = TRUE)
pca:

autoencoder:

vyepez88 · 2025-02-20T20:11:31Z

Hi, from the plots it seems that gene's dispersion is quite big ranging from 5-30K, making it difficult to identify outliers.
Why are you reporting DESeq's fold changes? What was your experimental design to obtain those values? Are you interested in differential or aberrant expression detection?

qj009 · 2025-02-20T20:19:32Z

Hi @vyepez88 , initially I am interested in differential expression but my control set and case set size is unbalanced and sometime I only have one case vs some controls. So aberrant expression detection provided by OUTRIDER seems fit my experiment design.

May I ask why the big range of gene's dispersion make OUTRIDER hard to identify outliers?

vyepez88 · 2025-02-20T20:58:03Z

In OUTRIDER it is assumed that all samples come from the same population. If the controls are very similar and the case is very different, it might not work.
Let's say that the gene's mean expression is 15K. If the range is 10-20K, and your sample has 30K counts, then it would be an outlier. But if the range is 5-30K, then it's not.
You can add your sample to the plot by using the label parameter:
plotExpressionRank(ods, 'gene', label = 'sample id', basePlot = TRUE)

qj009 · 2025-02-21T18:14:23Z

Thank you @vyepez88 very much for your explanation. Do you have a suggested gene read count range (differences or standard deviation) for OUTRIDER to detect outliers?

Besides, I am not quite understand:

If the controls are very similar and the case is very different, it might not work.

Would it be easier to detect if the expression of a gene is very different in one sample comparing to other samples (like the example you made under a reasonable gene read count range)? What are you referring to the "different" here?

Thank you very much for your time and help! Very appreciated it!

vyepez88 · 2025-02-23T18:01:50Z

Regarding the standard deviation, good question. I usually plot the fold changes of each gene (value in a sample / mean value across samples). If the distribution is around 0.75-1.25, then it's easy to detect 50% increases or reductions. See Fig 3C of https://doi.org/10.1186/s13073-022-01019-9
Regarding the second question, the assumption of OUTRIDER is that all points come from the same population. This population can be quite varied (e.g. random sampling of all humans in the world), or relatively homogeneous. However, if you are modelling a same population together with only 1 sample from another population (eg 100 females and 1 male; 100 leukemia and 1 unaffected, 100 blood and 1 skin, 100 caucasians and 1 south american), then the number of outliers in the 1 sample will be exaggerated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

ellango85 commented Aug 21, 2024

gelowin commented Oct 8, 2024 •

edited

Loading

AtaJadidAhari commented Oct 8, 2024 •

edited

Loading

AtaJadidAhari commented Oct 8, 2024

gelowin commented Oct 9, 2024 •

edited

Loading

AtaJadidAhari commented Oct 9, 2024

gelowin commented Oct 9, 2024

vyepez88 commented Oct 12, 2024

qj009 commented Feb 17, 2025 •

edited

Loading

vyepez88 commented Feb 17, 2025 •

edited

Loading

qj009 commented Feb 20, 2025

vyepez88 commented Feb 20, 2025

qj009 commented Feb 20, 2025 •

edited

Loading

vyepez88 commented Feb 20, 2025

qj009 commented Feb 21, 2025

vyepez88 commented Feb 23, 2025

No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

Comments

ellango85 commented Aug 21, 2024

gelowin commented Oct 8, 2024 • edited Loading

AtaJadidAhari commented Oct 8, 2024 • edited Loading

AtaJadidAhari commented Oct 8, 2024

gelowin commented Oct 9, 2024 • edited Loading

AtaJadidAhari commented Oct 9, 2024

gelowin commented Oct 9, 2024

vyepez88 commented Oct 12, 2024

qj009 commented Feb 17, 2025 • edited Loading

vyepez88 commented Feb 17, 2025 • edited Loading

qj009 commented Feb 20, 2025

vyepez88 commented Feb 20, 2025

qj009 commented Feb 20, 2025 • edited Loading

vyepez88 commented Feb 20, 2025

qj009 commented Feb 21, 2025

vyepez88 commented Feb 23, 2025

gelowin commented Oct 8, 2024 •

edited

Loading

AtaJadidAhari commented Oct 8, 2024 •

edited

Loading

gelowin commented Oct 9, 2024 •

edited

Loading

qj009 commented Feb 17, 2025 •

edited

Loading

vyepez88 commented Feb 17, 2025 •

edited

Loading

qj009 commented Feb 20, 2025 •

edited

Loading