Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract toolshed identifiers from Galaxy workflows #32

Open
stain opened this issue Jun 26, 2024 · 1 comment
Open

Extract toolshed identifiers from Galaxy workflows #32

stain opened this issue Jun 26, 2024 · 1 comment

Comments

@stain
Copy link
Member

stain commented Jun 26, 2024

There are toolshed identifiers inside Galaxy workflows, but these are not carried forward into the RO-Crate nor to the knowledge graph.

Example, from https://workflowhub.eu/workflows/7 we have Genomics-4-PE_Variation.ga with:

            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff_build_gb/4.3+T.galaxy4",
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpeff/snpEff_build_gb/4.3+T.galaxy4",
            "tool_shed_repository": {
                "changeset_revision": "74aebe30fb52",
                "name": "snpeff",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },

The identifiers exist in a mangled state in the Abstract CWL:

    run:
      class: Operation
      id: toolshed_g2_bx_psu_edu_repos_iuc_snpeff_snpEff_build_gb_4_3+T_galaxy4

..but they do not appear in the RO-Crate metadata.

Note that these identifiers are NOT global URIs, but almost! They are references to Mercurial but again they are not Mercurial URIs (hgt+http://).

Why do we want these? Well, on a good day you can then combine them with Toolshed information to find the bio.tool identifiers. But at the moment this tool information seems to be not exposed by Galaxy in a good way and it would be overkill for this work to try climbing into Mercurial...

@supernord
Copy link

Hi @stain
I've linked here to the BioHackathon 2022 mapping between WorkflowHub, Galaxy and bio.tools : https://github.com/bio-tools/biohackathon2022/blob/main/scripts/workflowhub_galaxy_biotools.py

Maybe this will be useful for the graph?

I think some elements of this are incorporated into the WorkflowHub registration process for Galaxy workflows, but like you pointed out this doesn't necessarily mean the metadata is in the RO-crate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants