master-spreadsheet-datasets

A data pipeline used to generate a master spreadsheet of all the datasets shared by clusters in the Critical Zone Collaborative Network.

How to use

This data analysis workflow uses Snakemake (installation instructions here) as a pipelining tool to retrieve, clean and munge the spreadsheet to maximize readability.

First, create a Conda environment with all the required packages by running the following command: conda env create -f environment.yaml

Once in the new environment, we can execute the snakemake pipeline with this command: snakemake --cores 1 -s Snakefile.smk --forceall

When the jobs are done, the output master spreadsheet containing all cluster datasets will be in a newly created out folder in 3_munge/.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
1_fetch/src		1_fetch/src
2_clean/src		2_clean/src
3_munge/src		3_munge/src
README.md		README.md
Snakefile.smk		Snakefile.smk
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

master-spreadsheet-datasets

How to use

About

Releases

Packages

Languages

cznethub/master-spreadsheet-datasets

Folders and files

Latest commit

History

Repository files navigation

master-spreadsheet-datasets

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages