Skip to content

Commit 8db4f9c

Browse files
committed
Update authors and readme.md
- Addresses #40, #39, #38, #36 - Added newer diagram to readme. - Removed references to ghcr, this will be added back later - Added proper authors and orcids - Replaced Dockerfile encodingFormat with text/plain - Added created_files.json to RO Crate and added to data_entity["isBasedOn"].
1 parent f2bab54 commit 8db4f9c

File tree

2 files changed

+87
-50
lines changed

2 files changed

+87
-50
lines changed

README.md

+26-25
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,42 @@
1-
# WorkflowHub Knowledge Graph
1+
# WorkflowHub Knowledge Graph
22

3-
## Getting started
3+
A tool to generate a knowledge graph from a source of RO Crates. By default, this tool sources and generates an RDF graph of crates from [WorkflowHub](https://workflowhub.eu/).
44

5-
### Obtaining workflowhub-graph
5+
## Getting Started
66

7-
workflowhub-graph is available packaged as a Docker container. You can pull the latest version of the container by running:
7+
This tool is run as a Snakemake workflow. We recommend building a Docker container to run the workflow:
88

9-
```bash
10-
docker pull ghcr.io/uomresearchit/workflowhub-graph:latest
9+
```bash
10+
docker build -t knowledgegraph .
1111
```
1212

13-
This provides the a wrapper for the executable `workflowhub-graph` which can be used to run the various tools provided by the package.
13+
Then, you can run the workflow using the following command:
1414

15-
### Running workflowhub-graph
15+
```bash
16+
docker run --rm -v $(pwd):/app -w /app knowledgegraph --cores 4 -s /app/Snakefile
17+
```
1618

17-
There are several tools provided by the `workflowhub-graph` package. These are:
18-
- 'help': Display help information.
19-
- 'source-crates': Download ROCrates from the WorkflowHub API.
20-
- 'absolutize': Make all paths in an ROCrate absolute.
21-
- 'upload': Upload an ROCrate to Zenodo.
22-
- 'merge': Merge multiple ROCrates into an RDF graph.
19+
This command runs a Docker container using the `knowledgegraph` image. It mounts the working directory to `/app`
20+
inside the container, sets `/app` as the working directory, and then runs the workflow. Once the workflow completes,
21+
the container is automatically removed.
2322

24-
To run any of these tools, you can use the following command:
23+
## Structure
2524

26-
```bash
27-
docker run ghcr.io/uomresearchit/workflowhub-graph:latest <tool> <args>
25+
```mermaid
26+
flowchart TD
27+
A[Source RO Crates] --> B[Check Outputs];
28+
B[Check Outputs] --> C[Report Downloaded RO Crates];
29+
B[Check Outputs]-->D[Merge RO Crates];
30+
D[Merge RO Crates]-->E[Create Merged Workflow Run RO Crate]
2831
```
2932

30-
For example, to download ROCrates from the WorkflowHub API, you can run:
33+
- **`source_ro_crates`**: This rule sources RO crates from the WorkflowHub API (`source_crates.py`) and then checks
34+
the output (`check_outputs.py`). This generates a list of expected file paths based on the workflow IDs and versions to
35+
facilitate the workflow.
3136

32-
```bash
33-
docker run ghcr.io/uomresearchit/workflowhub-graph:latest source-crates
34-
```
37+
- **`report_created_files`**: Optional. This rule reports the downloaded RO crates to the user.
38+
- **`merge_files`**: This rule merges the downloaded RO crates into a single RDF graph (`merge_ro_crates.py`).
39+
- **`create_ro_crate`**: This rule creates a merged workflow run RO crate from the merged RDF graph (`create_ro_crate.py`).
3540

3641
## Contributing
3742

@@ -46,10 +51,6 @@ docker run ghcr.io/uomresearchit/workflowhub-graph:latest source-crates
4651
- **Development Branch**: The `develop` branch is currently our main integration branch. Features and fixes should target `develop` through PRs.
4752
- **Feature Branches**: These feature branches should be short-lived and focused. Once done, please create a pull request to merge it into `develop`.
4853

49-
## Overview
50-
51-
![arch_diagram.png](./docs/images/arch_diagram.png)
52-
5354
## License
5455

5556
[BSD 2-Clause License](https://opensource.org/license/bsd-2-clause)

workflowhub_graph/create_ro_crate.py

+61-25
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ def create_ro_crate(input_file: str, workflow_file: str, output_dir: str) -> Non
1212
:param input_file: The input file provided by the Snakemake workflow (e.g., merged data file).
1313
:param workflow_file: Reference to the Snakemake workflow.
1414
:param output_dir: The output directory to store the RO-Crate metadata file.
15-
:return:
1615
"""
1716
crate = ROCrate()
1817

@@ -23,18 +22,54 @@ def create_ro_crate(input_file: str, workflow_file: str, output_dir: str) -> Non
2322
)
2423

2524
# Add authors:
26-
alice = crate.add(
25+
auth_1 = crate.add(
2726
Person(
2827
crate,
2928
"https://orcid.org/0000-0000-0000-0000",
30-
properties={"name": "Alice Doe", "affiliation": "University of Flatland"},
29+
properties={
30+
"name": "Alexander Hambley",
31+
"affiliation": "University of Manchester",
32+
},
33+
)
34+
)
35+
auth_2 = crate.add(
36+
Person(
37+
crate,
38+
"https://orcid.org/0000-0002-0035-6475",
39+
properties={
40+
"name": "Eli Chadwick",
41+
"affiliation": "University of Manchester",
42+
},
43+
)
44+
)
45+
auth_3 = crate.add(
46+
Person(
47+
crate,
48+
"https://orcid.org/0000-0002-4565-9760",
49+
properties={
50+
"name": "Oliver Woolland",
51+
"affiliation": "University of Manchester",
52+
},
3153
)
3254
)
33-
bob = crate.add(
55+
auth_4 = crate.add(
3456
Person(
3557
crate,
36-
"https://orcid.org/0000-0000-0000-0001",
37-
properties={"name": "Bob Doe", "affiliation": "University of Flatland"},
58+
"https://orcid.org/0000-0001-9842-9718",
59+
properties={
60+
"name": "Stian Soiland-Reyes",
61+
"affiliation": "University of Manchester",
62+
},
63+
)
64+
)
65+
auth_5 = crate.add(
66+
Person(
67+
crate,
68+
"https://orcid.org/0000-0001-6353-0808",
69+
properties={
70+
"name": "Volodymyr Savchenko",
71+
"affiliation": "University of Geneva",
72+
},
3873
)
3974
)
4075

@@ -52,12 +87,23 @@ def create_ro_crate(input_file: str, workflow_file: str, output_dir: str) -> Non
5287
properties={
5388
"@type": "File",
5489
"name": "Dockerfile",
55-
"encodingFormat": "application/yaml",
90+
"encodingFormat": "text/plain",
5691
"description": "The Dockerfile used to build the Docker images for the workflow.",
5792
"conformsTo": {"@id": "https://docs.docker.com/reference/dockerfile/"},
5893
},
5994
)
6095

96+
created_files = crate.add_file(
97+
"./created_files.json",
98+
properties={
99+
"@type": "File",
100+
"name": "created_files.json",
101+
"encodingFormat": "application/json",
102+
"description": "A JSON file containing the list of files sourced by the workflow.",
103+
"conformsTo": {"@id": "https://docs.docker.com/reference/dockerfile/"},
104+
},
105+
)
106+
61107
crate.add_file("./poetry.lock")
62108
crate.add_file("./README.md")
63109

@@ -68,42 +114,32 @@ def create_ro_crate(input_file: str, workflow_file: str, output_dir: str) -> Non
68114
"name": "Merged Data File",
69115
"description": "This file contains merged RDF triples from multiple RO-Crates sourced from WorkflowHub.",
70116
"encodingFormat": "text/turtle",
71-
"author": [alice["@id"], bob["@id"]],
72117
},
73118
)
74119

120+
data_entity["author"] = [auth_1, auth_2, auth_3, auth_4, auth_5]
121+
data_entity["isBasedOn"] = created_files
122+
75123
workflow_entity = crate.add_workflow(
76124
source=workflow_file,
77125
properties={
78126
"name": "Snakemake Workflow",
79127
"description": "This is the Snakemake workflow used to generate the merged RDF triples.",
80-
"author": [alice["@id"], bob["@id"]],
81-
"output": data_entity["@id"],
82128
},
83129
main=True,
84130
lang="snakemake",
85131
)
86132

133+
workflow_entity["author"] = [auth_1, auth_2, auth_3, auth_4]
134+
workflow_entity["output"] = data_entity
135+
87136
if "conformsTo" not in crate.root_dataset:
88137
crate.root_dataset.append_to(
89138
"conformsTo", {"@id": "https://w3id.org/ro/wfrun/workflow/0.5"}
90139
)
91140

92-
crate.add(
93-
ContextEntity(
94-
crate,
95-
identifier=str(uuid.uuid4()),
96-
properties={
97-
"@type": "CreateAction",
98-
"name": "Merge RDF Triples",
99-
"description": "Merging RDF triples from sourced crates.",
100-
"agent": [alice["@id"], bob["@id"]],
101-
"endTime": datetime.now().time().isoformat(),
102-
"instrument": workflow_entity["@id"],
103-
"result": data_entity["@id"],
104-
},
105-
)
106-
)
141+
# Add license:
142+
crate.license = "https://opensource.org/license/bsd-2-clause"
107143

108144
# Writing the RO-Crate metadata:
109145
crate.write(output_dir)

0 commit comments

Comments
 (0)