Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the number of used APIs (only api.geneontology.org) #114

Open
2 of 7 tasks
kltm opened this issue Dec 4, 2024 · 27 comments
Open
2 of 7 tasks

Reduce the number of used APIs (only api.geneontology.org) #114

kltm opened this issue Dec 4, 2024 · 27 comments
Labels
enhancement New feature or request

Comments

@kltm
Copy link
Member

kltm commented Dec 4, 2024

Recently, we discovered that api.geneontology.xyz is still in used by Alliance pages to return whether or not there are GO-CAM models available for display with the pathway viewer. Since the work from @sierra-moxon , these endpoints are duplicated within the GO API at api.geneontology.org, which is what should be used for all such operations.

  • Determine if only the Alliance is using api.genentology.xyz with logs or proxy (may be tricky) @dustine32 - yes it is. here: https://github.com/alliance-genome/agr_ui/blob/main/src/components/pathway/pathwayWidget.js
  • Determine if any other users are using api.geneontology.xyz with logs or proxy.
    • use by PomBase
  • Open an issue for the API endpoint change in the Alliance to api.geneontology.org (or open a PR?) @sierra-moxon
  • If there are other apparent users of the API, see if we can track them down
    • update from @sierra-moxon : FlyBase is using api.geneontology.org correctly.
  • Try and figure out how to get (current) api.geneontology.xyz users onto api.geneontology.org
  • Shutdown the api.geneontology.xyz
@kltm
Copy link
Member Author

kltm commented Dec 12, 2024

Related: geneontology/go-site#1800

@kltm
Copy link
Member Author

kltm commented Dec 12, 2024

@dustine32 Will be looking at ways of capturing requests to api.geneontology.xyz, so we can contact users, understand use cases, and create a plan to deactivate this endpoint.

@kltm
Copy link
Member Author

kltm commented Dec 12, 2024

Known users of api.geneontology.xyz (edit to add):

@kltm
Copy link
Member Author

kltm commented Dec 12, 2024

Tagging @kimrutherford and @dlrice as interested parties.

@kimrutherford
Copy link

Speaking for PomBase, our ideal is access the production models from Noctua as soon as they are marked as "production".
We use and like api.geneontology.xyz because it gives us that.

We have gocam-viz configured to point to that API and we also access it programmatically a couple of times per week grab the genes and terms from the production models for storing in our database.

If we could get access the models created by PomBase as JSON files that would be fine. We would serve those from pombase.org for gocam-viz. And we'd be able to read any data we need for adding to our database.

Could a daily job pull production models for each group from the Noctua database and host them somewhere?

Something like https://github.com/geneontology/noctua-models but with a sub-directory per group (PomBase/FlyBase, etc.) and files in JSON format could replace api.geneontology.xyz for us. It would also make the Noctua data very accessible for any other downstream user.

@ValWood

@kltm
Copy link
Member Author

kltm commented Dec 13, 2024

@kimrutherford thank you for your feedback (and my apologies again for my unthoughtful words on slack).
If we are going to be looking at some kind of raw feed of the "live" models, understanding the frequency needs is very useful.

@sierra-moxon
Copy link
Member

sierra-moxon commented Dec 13, 2024

@kltm - I find no evidence of alliance using api.geneontology.xyz in their public codebase: https://github.com/search?q=org%3Aalliance-genome%20api.geneontology.xyz&type=code, but I do see api.geneontology.org calls: https://github.com/search?q=org%3Aalliance-genome+api.geneontology.org&type=code

I don't think Alliance is one of the users of api.geneontology.xyz -- it could be we were tracking "FlyBase" usage here in this ticket (which uses a different implementation of the ribbon (and pathway widget?), at least, than Alliance)

@sierra-moxon
Copy link
Member

sierra-moxon commented Dec 13, 2024

I took a look through FlyBase repos: https://github.com/search?q=org%3AFlyBase+api.geneontology&type=code&p=2. They too use api.geneontology.org and there are no instances of api.geneontology.xyz

@dustine32
Copy link

@sierra-moxon Are you including the agr_ui codebase in your Alliance assessment or just looking specifically at their API code? There is still this usage of .xyz here:
https://github.com/alliance-genome/agr_ui/blob/23eed427cea404ef97feaf17ecbbaaabe2a0a275/src/components/pathway/pathwayWidget.js#L165
I just confirmed by looking at the browser network tab that this call is made on the Alliance gene pages:
image

This brings up an issue with determining the website source of calls to .xyz. This agr_ui code is run client-side, meaning that the call is directly from the website user's computer to the .xyz endpoint and thus will not have the Alliance site's IP in the .xyz API call logs (that I set up on AWS CloudWatch). This results in (after filtering out all the calls from search crawlers) a bunch of different IP's for the end user's ISP or institution (e.g., UPenn, UMass Medical School) and not the nice, small list of websites we can contact to ask them to kindly redirect to a different endpoint.

@sierra-moxon
Copy link
Member

thanks @dustine32!
fwiw, that should be a very easy swap to api.genontology.org

kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Dec 13, 2024
kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Dec 13, 2024
kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Dec 13, 2024
kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Dec 13, 2024
@kimrutherford
Copy link

Something like https://github.com/geneontology/noctua-models but with a sub-directory per group (PomBase/FlyBase, etc.) and files in JSON format could replace api.geneontology.xyz for us.

This is a follow up on a comment by @kltm on slack. The sub-directory per group is just a suggestion.

In a perfect world we'd like this we data updated daily-ish(*):

It makes sense for the two second endpoints to be the same so nothing gets out of sync.

If there was an file or endpoint that allowed us to download all the PomBase production models as a single JSON blob, that would save a lot of API calls.

(*) I'll let @ValWood comment on that. PomBase is updated daily so a daily snapshot of production models seems ideal from my point of view. We use the daily GO build already (https://ontology-build.geneontology.org)

@kltm
Copy link
Member Author

kltm commented Dec 14, 2024

@kimrutherford I'm doing some experiments right now otherwise, but in an emergency, please note that the highest frequency JSON models that are currently officially offered are at: http://snapshot.geneontology.org/products/json/ . These are all models for all resources--essentially a transform of the GitHub TTL files to the JSON that drives the widget.

@kltm
Copy link
Member Author

kltm commented Dec 14, 2024

@kimrutherford I don't want to commit to anything and I'm really just trying to scope out different approaches, but I was wondering grossly how close something like this might be to usable for PomBase? For the moment, at https://skyhook.geneontology.io/pipeline-raw-go-cam/main/products/json/ there is a metadata.json file that maps providedBy in GO-CAM model metadata to GO-CAM model local IDs. In the jsonout directory, there are all of the model files in JSON. This is being delivered statically through a CDN, so has no effect on us.

(Noting for myself that the new machine gains an hour on the current machine for this.)

kltm added a commit to geneontology/go-site that referenced this issue Dec 14, 2024
@kimrutherford
Copy link

I was wondering grossly how close something like this might be to usable for PomBase?

That looks great! That plan would work very well for us.

@kltm
Copy link
Member Author

kltm commented Dec 18, 2024

Also tagging @sjm41 as having a possible interest with this conversation.

@kltm
Copy link
Member Author

kltm commented Dec 29, 2024

Noting that the TTLs from production seem to be going in well. Fixing pipeline cron.

@kltm
Copy link
Member Author

kltm commented Jan 3, 2025

Still experimenting, but the following is in place for conversation:

  • three attempts per week are made to move models off of production to an S3 store (M,W,F)
  • three attempts are made per week to render the raw models and produce a JSON metadata file (M,W,F)

Noting that, pleasingly, the content-type seems to be automatically correct.

Runs currently take about 5-6 hours, with failures so far having to do with internal networking hiccups. As the machine that's being used in this case is slated for being the new pipeline machine, we'll have to see how these loads interact with each other moving forward.

@kltm
Copy link
Member Author

kltm commented Jan 6, 2025

Noting a timezone issue, where the rendering happened before the transfer. Should be fixed later today; testing now.

@kimrutherford
Copy link

Thanks very much @kltm

We've switched the PomBase dev server to use provider-to-model.json and to use the new location to get the models. It's working very well. We're not using api.geneontology.xyz anywhere now.

There is a small difference between the JSON from api.geneontology.xyz and the JSON from live-go-cam.geneontology.io, the "property-label" values are IDs:

56d1143000003353 from api.geneontology.xyz:

      {
        "subject": "gomodel:56d1143000003353/56d1143000003354",
        "property": "BFO:0000066",
        "property-label": "occurs in",
        ...

56d1143000003353 from live-go-cam.geneontology.io:

      {
        "subject": "gomodel:56d1143000003353/56d1143000003354",
        "property": "BFO:0000066",
        "property-label": "BFO:0000066",
        ...

@kltm
Copy link
Member Author

kltm commented Jan 7, 2025

@kimrutherford Thank you for the feedback there. If it looks like it's working for you over a few days, let's talk about next steps.

The property-label issue there is interesting. My guess is that ontologies that are loaded for the production system are not quite the same in the "simulation" we have in this new experimental pipeline.

@balhoff The commands we are using are:

minerva-cli.sh --import-owl-models -f models -j blazegraph.jnl
minerva-cli.sh --dump-owl-json --journal blazegraph.jnl --ontojournal blazegraph-go-lego-reacto-neo.jnl --folder jsonout

which would be minerva's "default" ontology, correct? It looks like production is using go-lego-reacto.owl.

@balhoff
Copy link
Member

balhoff commented Jan 8, 2025

@kltm can you try also providing an ontology file using -g:

-g file:go-lego-reacto.owl

kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Jan 9, 2025
@kltm
Copy link
Member Author

kltm commented Jan 9, 2025

From discussion today, @dustine32 will, when free, take a quick look if anything useful can be pulled from headers. After that, we'll just go for an announcement.

kltm added a commit to geneontology/pipeline-raw-go-cam that referenced this issue Jan 10, 2025
@kltm
Copy link
Member Author

kltm commented Jan 12, 2025

@balhoff Unfortunately, even loading blazegraph-go-lego-reacto-neo.jnl, pulled off of the NEO build, I'm still not getting the property-label to render as expected. I'm hoping it's not separate code paths, as you theorized.

@kltm
Copy link
Member Author

kltm commented Jan 14, 2025

Talking to @balhoff , there is a slight issue with the CLI arguments available. He'll be looking at this.

@kltm
Copy link
Member Author

kltm commented Jan 14, 2025

@kimrutherford I heard that you'd be going live with GO-CAM in PomBase tomorrow. I wanted to confirm where you're getting your data and if it would be disrupting if we fixed to use the correct property-label values?

@kimrutherford
Copy link

@kltm We're getting the data from live-go-cam but we're keeping a local copy. If there are ever any problems with live-go-cam we can keep using our local copy while things are fixed. We now serve the JSON for gocam-viz directly from pombase.org using that copy.

if it would be disrupting if we fixed to use the correct property-label values?

I don't think the fix will be a problem for us. We don't use those values in PomBase but perhaps the values are used by gocam-viz?

@kimrutherford
Copy link

If it looks like it's working for you over a few days, let's talk about next steps.

An update after two weeks: It's all been working well for PomBase. We're happy with the current structure of the files at live-go-cam.geneontology.io and it has been very reliable.

Thanks!

@ValWood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

5 participants