Skip to content

A data pipeline used to generate an up-to-date master spreadsheet containing all CZNet datasets.

Notifications You must be signed in to change notification settings

cznethub/master-spreadsheet-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

master-spreadsheet-datasets

A data pipeline used to generate a master spreadsheet of all the datasets shared by clusters in the Critical Zone Collaborative Network.

How to use

This data analysis workflow uses Snakemake (installation instructions here) as a pipelining tool to retrieve, clean and munge the spreadsheet to maximize readability.

First, create a Conda environment with all the required packages by running the following command: conda env create -f environment.yaml

Once in the new environment, we can execute the snakemake pipeline with this command: snakemake --cores 1 -s Snakefile.smk --forceall

When the jobs are done, the output master spreadsheet containing all cluster datasets will be in a newly created out folder in 3_munge/.

About

A data pipeline used to generate an up-to-date master spreadsheet containing all CZNet datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages