-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No Aberrant Expression was found in Targeted Panel RNA Sequence Data #70
Comments
Hi,
|
Hi @gelowin, running OUTRIDER with such a small set of simulated data will not work. It is recommended to run OUTRIDER with at least 50 samples. With so few samples, you would not get any significant results after FDR correction. |
Hi @ellango85, sorry for the late reply. How many samples do you have? You can also try to check the expression values for some genes that you think might have outliers using:
to see what's going on. |
@AtaJadidAhari Thank you very much for your response. Perhaps I don't fully understand what an "aberrant" observation is. I initially thought it referred to a extreme outlier. For instance, I can simulate 10k genes and 100 subjects, introduce two clear outliers, and still get results indicating "No significant events", for example:
Could you kindly clarify what is meant by 'aberrant status' in this context? I initially thought it referred to the outlier status of an event, but I believe I may have misunderstood. Thank you very much |
Hi @gelowin, the problem with this simulation is that the "extreme" outliers are simply too extreme and OUTRIDER cannot fit an autoencoder properly to this data. I redid your analysis with less extreme values, e.g. 9999 instead of 999999999 and then OUTRIDER produced the expected results:
|
Ok, thank you very much. |
Hi @gelowin, I guess the problem is that 999999999 is so high that it's unrealistic and beyond the sort of expected values for the negative binomial distribution. Not sure if with the autoencoder it "freezes" or it's actually working. It can take a couple of hours. |
Hello, I have a similar problem. I am running OUTRIDER(version: 1.24.0) to identify abnormal expressed gene in my data. However, it cannot identify the high expression genes we have. The numbers are as listed below: The adjusted P value is 1 and P value is way larger than 0.05. The running script is as below: ## at least one sample with x reads`
samplef <- sample %>% filter_all(any_vars(.>=x))
# or
# mean read count larger than x reads
samplef <- sample[rowMeans(sample)>=x,]
# or
# all samples with x or more support read counts
samplef <- sample %>% filter_all(all_vars(.>=x))
# generate OUTRIDER data set
ods <- OutriderDataSet(countData= samplef)
# implementation = “pca” or “autoencoder”
ods <- OUTRIDER(ods, implementation=implementation, BPPARAM=MulticoreParam(20))
# keep all results to look up result of the high expressed gene (positive control)
res <- results(ods, all=TRUE)
With the filter : mean read count larger than 50, around 500 genes remain but adjusted P values are 1 for the known high expression gene in the positive sample using pca or autoencoder implementation method. Even the p values are larger than 0.1. There are over 100 samples included in the analysis. The result is similar using other read count filters listed above. Any suggestions are appreciated. Thank you! |
Hi, how high is the high expression gene? What are the normalized counts, fold change and p-value? |
Hello @vyepez88 , thank you for your response! The DESeq2 result: baseMean(The average of the normalized count values) is 16248.09, log2FC = 3.27, pvalue = 0.00012, padj = 0.007. Below are the plots generated by: autoencoder: |
Hi, from the plots it seems that gene's dispersion is quite big ranging from 5-30K, making it difficult to identify outliers. |
Hi @vyepez88 , initially I am interested in differential expression but my control set and case set size is unbalanced and sometime I only have one case vs some controls. So aberrant expression detection provided by OUTRIDER seems fit my experiment design. May I ask why the big range of gene's dispersion make OUTRIDER hard to identify outliers? |
In OUTRIDER it is assumed that all samples come from the same population. If the controls are very similar and the case is very different, it might not work. |
Thank you @vyepez88 very much for your explanation. Do you have a suggested gene read count range (differences or standard deviation) for OUTRIDER to detect outliers? Besides, I am not quite understand:
Would it be easier to detect if the expression of a gene is very different in one sample comparing to other samples (like the example you made under a reasonable gene read count range)? What are you referring to the "different" here? Thank you very much for your time and help! Very appreciated it! |
Regarding the standard deviation, good question. I usually plot the fold changes of each gene (value in a sample / mean value across samples). If the distribution is around 0.75-1.25, then it's easy to detect 50% increases or reductions. See Fig 3C of https://doi.org/10.1186/s13073-022-01019-9 |
Hi,
I have a Target RNA Panel Sequence (Set of Genes are enriched) data. I have tried the OUTRIDER with default parameter, but I couldn't able get any aberrant expression even in known positive samples, which has known Aberrant splicing event.
Does the OUTRIDER will work on the Targeted Panel RNA Sequence Data or it will only work on non-Targeted whole transcriptome data!.
Even I tried to merge the Target and non-target Count file and failed due to some error [Run error #31669 unevaluated and other errors !!!]
Kindly help me out to fix this issue.
Best,
Ellango
The text was updated successfully, but these errors were encountered: