Replies: 1 comment
-
Hi @haerangl, The http://current.geneontology.org/annotations/goa_uniprot_all.gaf.gz file has also gone through the QA for completeness of annotations, but you are free to use that & filter yourself on taxon if you have any concerns- this is more useful for species we don't produce a file for e.g. cucumber. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I downloaded the dataset
goa_human.gaf
from Gene Ontology Consortium. The description in the file and on the website say that this dataset has been "filtered in order to reduce redundancy". What does that mean?My best guess
I found the release pipeline and the Annotation QC checks: http://wiki.geneontology.org/index.php/Release_Pipeline#Annotation_QC_checks
And this pointed me to the GO rules: https://github.com/geneontology/go-site/blob/master/metadata/rules/README.md
There are so many rules here that I don't fully understand these. However, these look more or less like quality assurance filters to make sure the data is clean and usable. What I might be worried about is a systematic removal of GO terms that I did not consider/assume. My goal is to use these GO to find protein functional similarity (based on information content and jaccard similarity - see Funsim measures section of this paper). I'd like to ensure that a chunk of the data wasn't just removed for a totally different goal/purpose (e.g., certain level of frequency that isn't useful for a specific and different task such as gene set enrichment analysis).
Any insights?
More info
Website description (http://current.geneontology.org/products/pages/downloads.html)
Data downloaded:
Data dictionary: http://geneontology.org/docs/go-annotation-file-gaf-format-2.1/
Data header:
Beta Was this translation helpful? Give feedback.
All reactions