Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70

Open
ellango85 opened this issue Aug 21, 2024 · 15 comments
Open

Comments

@ellango85
Copy link

Hi,
I have a Target RNA Panel Sequence (Set of Genes are enriched) data. I have tried the OUTRIDER with default parameter, but I couldn't able get any aberrant expression even in known positive samples, which has known Aberrant splicing event.

Does the OUTRIDER will work on the Targeted Panel RNA Sequence Data or it will only work on non-Targeted whole transcriptome data!.

Even I tried to merge the Target and non-target Count file and failed due to some error [Run error #31669 unevaluated and other errors !!!]
Kindly help me out to fix this issue.

Best,
Ellango

@gelowin
Copy link

gelowin commented Oct 8, 2024

Hi,
I have the same problem. I create a toy example and no aberrant genes was discovered.

counts <- data.frame(
    id1 = withr::with_seed(seed = 1, rnbinom(100, size = 200, mu = 400)),
    id2 = withr::with_seed(seed = 2, rnbinom(100, size = 20,  mu = 800)),
    id3 = withr::with_seed(seed = 3, rnbinom(100, size = 2000, mu = 100)))
## Simulate some aberrant observations for demonstration purposes
counts[2:3, ] <- 4000000
counts[4,1] <- 888888888

row.names(counts) <- paste0("gene", 1:100)
counts <- as.matrix(counts)
counts <- round(counts) ## counts must be integer
annot_db <- data.frame(sampleID = paste0("id", 1:3), sex = c("F", "M", "M"))
ods <- OUTRIDER::OutriderDataSet(countData = counts, colData = annot_db)
head(assay(ods))


ods <- OUTRIDER::OUTRIDER(ods, implementation = "pca")
outrider_output <- OUTRIDER::results(ods)
## No significant events: 

@AtaJadidAhari
Copy link
Collaborator

AtaJadidAhari commented Oct 8, 2024

Hi @gelowin, running OUTRIDER with such a small set of simulated data will not work. It is recommended to run OUTRIDER with at least 50 samples. With so few samples, you would not get any significant results after FDR correction.

@AtaJadidAhari
Copy link
Collaborator

Hi @ellango85, sorry for the late reply. How many samples do you have?

You can also try to check the expression values for some genes that you think might have outliers using:

plotExpressionRank(ods, "your_gene_of_interest", basePlot=TRUE) 

to see what's going on.

@gelowin
Copy link

gelowin commented Oct 9, 2024

@AtaJadidAhari Thank you very much for your response.

Perhaps I don't fully understand what an "aberrant" observation is. I initially thought it referred to a extreme outlier.

For instance, I can simulate 10k genes and 100 subjects, introduce two clear outliers, and still get results indicating "No significant events", for example:

counts <- replicate( n = 100, rnbinom(10000, size = 200, mu = 50))
colnames(counts) <- paste0("id", 1:100)
rownames(counts) <- paste0("gen", 1:10000)
## introducing two aberrant observations:
counts[2, 2] <- 999999999
counts[1, 3] <- 999999999

annot_db <- data.frame(sampleID = paste0("id", 1:100), sex = c("F", "M"))
ods <- OUTRIDER::OutriderDataSet(countData = counts, colData = annot_db)
ods <- OUTRIDER::OUTRIDER(ods, implementation = "pca")
outrider_output <- OUTRIDER::results(ods)

plotExpectedVsObservedCounts(ods, "gen2", basePlot = TRUE)
plotQQ(ods, "gen2")
plotExpressionRank(ods, "gen2", basePlot=TRUE) 

Could you kindly clarify what is meant by 'aberrant status' in this context? I initially thought it referred to the outlier status of an event, but I believe I may have misunderstood.

Thank you very much

@AtaJadidAhari
Copy link
Collaborator

Hi @gelowin, the problem with this simulation is that the "extreme" outliers are simply too extreme and OUTRIDER cannot fit an autoencoder properly to this data. I redid your analysis with less extreme values, e.g. 9999 instead of 999999999 and then OUTRIDER produced the expected results:
image

> outrider_output
Key: <geneID, sampleID>
   geneID sampleID       pValue      padjust zScore  l2fc rawcounts meanRawcounts normcounts meanCorrected theta aberrant
   <char>   <char>        <num>        <num>  <num> <num>     <int>         <num>      <num>         <num> <num>   <lgcl>
1:   gen1      id3 2.099035e-22 2.054453e-17   8.49  5.84      9999        150.72    3135.68         83.81  1.24     TRUE
2:   gen2      id2 5.766593e-21 5.644114e-16   9.36  7.36      9999        147.59    8523.82        133.06  0.63     TRUE
   AberrantBySample AberrantByGene padj_rank
              <num>          <num>     <num>
1:                1              1         1
2:                1              1         1

@gelowin
Copy link

gelowin commented Oct 9, 2024

Ok, thank you very much.
Yes, with 9999, works. Anyway it seems that it should work also for those more aberrant. I did not try the autoencoder, because it seems freeze all the time but with the "pca" is not detected.
Thank you very much for your response and for your package.

@vyepez88
Copy link
Contributor

Hi @gelowin, I guess the problem is that 999999999 is so high that it's unrealistic and beyond the sort of expected values for the negative binomial distribution. Not sure if with the autoencoder it "freezes" or it's actually working. It can take a couple of hours.

@qj009
Copy link

qj009 commented Feb 17, 2025

Hello, I have a similar problem.

I am running OUTRIDER(version: 1.24.0) to identify abnormal expressed gene in my data. However, it cannot identify the high expression genes we have. The numbers are as listed below:

The adjusted P value is 1 and P value is way larger than 0.05.
The total number of genes in the test is around 500 with several read count filters I tried:
mean read count larger than 50 reads
at least one sample with 10 reads
all samples with 10 reads

The running script is as below:

## at least one sample with x reads` 
samplef <- sample %>% filter_all(any_vars(.>=x)) 
# or 
# mean read count larger than x reads 
samplef <- sample[rowMeans(sample)>=x,] 
# or 
# all samples with x or more support read counts 
samplef <- sample %>% filter_all(all_vars(.>=x)) 
# generate OUTRIDER data set 
ods <- OutriderDataSet(countData= samplef# implementation = “pca” or “autoencoder” 
ods <- OUTRIDER(ods, implementation=implementation, BPPARAM=MulticoreParam(20)) 
# keep all results to look up result of the high expressed gene (positive control) 
res <- results(ods, all=TRUE

With the filter : mean read count larger than 50, around 500 genes remain but adjusted P values are 1 for the known high expression gene in the positive sample using pca or autoencoder implementation method. Even the p values are larger than 0.1. There are over 100 samples included in the analysis. The result is similar using other read count filters listed above.

Any suggestions are appreciated. Thank you!

@vyepez88
Copy link
Contributor

vyepez88 commented Feb 17, 2025

Hi, how high is the high expression gene? What are the normalized counts, fold change and p-value?
you can also do the expression plot: plotExpressionRank(ods, 'sample id', basePlot = TRUE)

@qj009
Copy link

qj009 commented Feb 20, 2025

Hello @vyepez88 , thank you for your response!

The DESeq2 result: baseMean(The average of the normalized count values) is 16248.09, log2FC = 3.27, pvalue = 0.00012, padj = 0.007.

Below are the plots generated by: plotExpressionRank(ods, 'BCL2', basePlot = TRUE)
pca:

Image

autoencoder:

Image

@vyepez88
Copy link
Contributor

Hi, from the plots it seems that gene's dispersion is quite big ranging from 5-30K, making it difficult to identify outliers.
Why are you reporting DESeq's fold changes? What was your experimental design to obtain those values? Are you interested in differential or aberrant expression detection?

@qj009
Copy link

qj009 commented Feb 20, 2025

Hi @vyepez88 , initially I am interested in differential expression but my control set and case set size is unbalanced and sometime I only have one case vs some controls. So aberrant expression detection provided by OUTRIDER seems fit my experiment design.

May I ask why the big range of gene's dispersion make OUTRIDER hard to identify outliers?

@vyepez88
Copy link
Contributor

In OUTRIDER it is assumed that all samples come from the same population. If the controls are very similar and the case is very different, it might not work.
Let's say that the gene's mean expression is 15K. If the range is 10-20K, and your sample has 30K counts, then it would be an outlier. But if the range is 5-30K, then it's not.
You can add your sample to the plot by using the label parameter:
plotExpressionRank(ods, 'gene', label = 'sample id', basePlot = TRUE)

@qj009
Copy link

qj009 commented Feb 21, 2025

Thank you @vyepez88 very much for your explanation. Do you have a suggested gene read count range (differences or standard deviation) for OUTRIDER to detect outliers?

Besides, I am not quite understand:

If the controls are very similar and the case is very different, it might not work.

Would it be easier to detect if the expression of a gene is very different in one sample comparing to other samples (like the example you made under a reasonable gene read count range)? What are you referring to the "different" here?

Thank you very much for your time and help! Very appreciated it!

@vyepez88
Copy link
Contributor

Regarding the standard deviation, good question. I usually plot the fold changes of each gene (value in a sample / mean value across samples). If the distribution is around 0.75-1.25, then it's easy to detect 50% increases or reductions. See Fig 3C of https://doi.org/10.1186/s13073-022-01019-9
Regarding the second question, the assumption of OUTRIDER is that all points come from the same population. This population can be quite varied (e.g. random sampling of all humans in the world), or relatively homogeneous. However, if you are modelling a same population together with only 1 sample from another population (eg 100 females and 1 male; 100 leukemia and 1 unaffected, 100 blood and 1 skin, 100 caucasians and 1 south american), then the number of outliers in the 1 sample will be exaggerated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants