Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new synonyms for terms #62

Open
jpinero opened this issue Dec 11, 2024 · 2 comments
Open

new synonyms for terms #62

jpinero opened this issue Dec 11, 2024 · 2 comments

Comments

@jpinero
Copy link

jpinero commented Dec 11, 2024

Is it possible to assign these synonyms to the following terms ?
They are mentioned in the GWAS catalog data like this, and they can't be normalized.

id: HANCESTRO:0324
name: Ashkenazi Jew
synonym: "Ashkenazi Jewish"

id: HANCESTRO:0496
name: Malaysian
synonym: "Malay"

id: HANCESTRO:0013
name: Indigenous American
synonym: "Native American"

id: HANCESTRO:0027
name: Han Chinese
synonym: "Chinese Han"

id: HANCESTRO:0572
name: Kosraean
synonym: "Kosraen"

id: HANCESTRO:0319
name: Korčulan
synonym: "Korculan"

id: HANCESTRO:0449
name: Saudi
synonym: "Saudi Arabian"

I have doubts with

  • Hispanic or Latin American
  • Hispanic/Latin American
  • Hispanic/Latino
  • Latino

They could be assigned to
id: HANCESTRO:0612
name: Hispanic
or
id: HANCESTRO:0014
name: Latin or Admixed American

@jpinero
Copy link
Author

jpinero commented Dec 11, 2024

Also, is it correct to have 2 hancestro identifiers with the same name?

id: HANCESTRO:0800
name: Sindhi in Pakistan (SGDP)
def: "A population made up of 2 samples from Sindhi individuals recruited in Pakistan as part of the Simons Genome Diversity Project. The population was assigned by the collecting project to the superpopulation South Asian." []
synonym: "Sindhi" EXACT []
is_a: HANCESTRO:0006 ! South Asian
is_a: HANCESTRO:0632 ! reference population
relationship: HANCESTRO:0308 http://dbpedia.org/resource/Pakistan ! hasCountryOfOrigin Pakistan

id: HANCESTRO:0710
name: Sindhi in Pakistan (HGDP)
def: "A population made up of 22 samples from Sindhi individuals recruited in Pakistan as part of the Human Genome Diversity Project. The population was assigned by the collecting project to the superpopulation Central South Asian." []
synonym: "Sindhi" EXACT []
is_a: HANCESTRO:0006 ! South Asian
is_a: HANCESTRO:0632 ! reference population
relationship: HANCESTRO:0308 http://dbpedia.org/resource/Pakistan ! hasCountryOfOrigin Pakistan

@daniwelter
Copy link
Collaborator

Hi @jpinero , thank you for your request.

Most of the synonym requests shouldn't be a problem. We will add them to the ontology in the next release, foreseen for Q1-2025.

The only one we can't do is:

id: HANCESTRO:0013
name: Indigenous American
synonym: "Native American"

We were explicitly asked by a research group working closely with Indigenous American populations to remove this term as it not only narrows the definition to the USA only but also conflates the term with US political designations that are historically problematic. We can add a note to the term to clarify this but we don't think it would be appropriate to reintroduce the old label as a synonym.

Regarding

  • Hispanic or Latin American
  • Hispanic/Latin American
  • Hispanic/Latino
  • Latino
    We would advise mapping these to Latin or Admixed American unless there is additional information available to indicate that the designation was collected via the US-census specific Hispanic or Latino origin question, which is normally collected separately from other self-identified ethnicities.

Regarding the similarity between some of the reference population classes,

id: HANCESTRO:0800
name: Sindhi in Pakistan (SGDP)

and

id: HANCESTRO:0710
name: Sindhi in Pakistan (HGDP)

refer to two separate reference populations, one recorded as part of the Human Genome Diversity Project (HGDP), consisting of 22 samples, and one recorded as part of the Simons Genome Diversity Project (SGDP), consisting of 2 samples. There is a lot of overlap between the populations sampled by the two projects but the reference populations should not be conflated. This is why they are represented as separate terms in HANCESTRO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants