Write out file with semantic similarity of all pairs of HP terms
Workflow:
- download HPOA file in order to calculate frequencies for each term
- download HPO
- turn HPO into DAG
- drop singletons
- drop non-HP terms
- confirm DAG-ness
- load graph into Ensmallen
- make Resnik model, calculate all by all Resnik similarity
- make Jaccard model, calculate all by all Jaccard similarity
- pandas join tables
- write out to file
- upload file to S3 (KG-Hub?) # decide on this later