Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonym sync: Abbreviation exclusions #668

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Sep 21, 2024

Resolves #652

Overview

Synonym sync: Abbreviation exclusions

  • Added abbreviation exclusion rules to config
  • Renamed mondo-exclusion-values.yml --> mondo-exclusion-configs.yml: I didn't rename it just mondo-exclusions in order to disambiguate from similarly named SOURCE_exclusions files, which serve the purpose of excluding entire terms. I went with the general 'configs' name, because these exclude a variety of different things now, from exclusion of entire synonyms being added, to exclusion of synonym type axioms.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

  • Yes
  • No, updates to the docs were not necessary after careful consideration

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after
docker pull obolibrary/odkfull:dev), and no errors occurred?

  • Yes
  • No, there are no functional (code-related) changes to the pipeline in the PR, so no re-run is necessary

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?

  • Yes

Google sheet

@joeflack4 joeflack4 changed the title Synonym sync: Abbreviation exclusions #652 Synonym sync: Abbreviation exclusions Sep 21, 2024
- Added abbreviation exclusion rules to config
- Updated config name to make more sense in light of this change
@joeflack4 joeflack4 force-pushed the sync1-synonyms-abbrev-excusions branch from 8f1abd4 to 7a085d8 Compare September 21, 2024 23:20
@joeflack4 joeflack4 self-assigned this Sep 21, 2024
@joeflack4 joeflack4 added the enhancement New feature or request label Sep 21, 2024
@joeflack4 joeflack4 linked an issue Sep 21, 2024 that may be closed by this pull request
2 tasks
@twhetzel
Copy link
Contributor

twhetzel commented Sep 22, 2024

@joeflack4 what goal was run to generate the data files in this PR?

Did you confirm from the curated examples that the list of synonyms that should not be tagged as an abbreviation are not tagged now?

"""Append additional synonym_type(s) based on logical rules."""
syn: str = row['synonym']
syn_type_str: str = row['synonym_type']
types: Set[URI] = set(syn_type_str.split('|') if syn_type_str else [])

# Acronym: uppercase, no numbers, no whitespace, <10 chars
if syn.isupper() and not any(char.isspace() for char in syn) and len(syn) < 10:
if syn not in exclusions and syn.isupper() and not any(char.isspace() for char in syn) and len(syn) < 10:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you confirm from the curated examples that the list of synonyms that should not be tagged as an abbreviation are not tagged now?

Yep! I checked a few cases in the new v8 google sheet, and they're longer being tagged as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link to the data file. I checked all the ones in the curated file and none are tagged as abbreviations now as expected with this PR.

Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! All synonyms that should be excluded from being tagged as an abbreviation are excluded.

@joeflack4 joeflack4 merged commit 9773e27 into sync1-synonyms Sep 23, 2024
@joeflack4 joeflack4 deleted the sync1-synonyms-abbrev-excusions branch September 23, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synonym sync: Abbreviation exclusions
2 participants