Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate identification of expanded alias name #15

Open
mcannon068nw opened this issue Sep 5, 2024 · 1 comment
Open

Automate identification of expanded alias name #15

mcannon068nw opened this issue Sep 5, 2024 · 1 comment

Comments

@mcannon068nw
Copy link

One problem for Aim 1 in Anastasia's dissertation is that once a gene-alias collision pair has been identified, we have to determine what the collision symbol actually represents in the context of the parent symbol. For example, CAP is listed as an alias for BRD4 but this symbol collides with so many other CAP aliases. In the context of BRD4, CAP actually refers to 'chromosome associated protein'. What cap stands for differs across different parent gene symbols. While this can be manually curated for a small set of genes, there exist over 100,000 gene-alias pairs to consider and so a programmatic approach will be needed.

Additionally, a separate but related problem will be to programmatically identify the type of collision(?).

@anastasiabratulin
Copy link
Contributor

Thank you for writing this out so eloquently! Yes, the problem is classifying the relationship between concept and alias. This relationship can be extracted from the symbol expansion, which is why we need a way to pull out the expansions programatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants